-
Notifications
You must be signed in to change notification settings - Fork 25.6k
SQL: Implement IN(value1, value2, ...) expression. #34581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement the functionality to translate the `field IN (value1, value2,...) expressions to proper Lucene queries or painless script depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: elastic#32955
|
Pinging @elastic/es-search-aggs |
|
I tested this a bit and a combination of a function and |
|
@matriv this one fails and I think it shouldn't: |
|
The above comparison is fixed in #34573 (the underlying null-safe equality does proper widening when comparing |
Thanks for catching that. Added validation and tests for nice error message. |
Strange that the one above fails, but the next one doesn't: |
astefan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, otherwise LGTM.
costin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I've left a number of comments but most of them are stylistic.
I wonder if there's an optimization rule that we can use to removes from the list the items that are known to not match in order to minimize the list and thus the number of pipes, etc.. that follows.
That would only work though if the value (the left as you say) is constant and thus all not matching constants from the list could be removed:
`SELECT 1 IN (2,3, foo) FROM TABLE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to specify this : if (isString() && other.isString())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To increase readability please move the two ifs to a separate method (checkInExpression?) similar to checkGroupBy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rightValue is a bit confusing since it also means the correct value. How about inValue or just value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding the rule but I think this can be simplified by initializing dt with a first value and avoid the double ifs (which care very similar).
If the rule check the the values between themselves, this can be done by picking the first item and then comparing it with the rest (through an index (a bit verbose but fast) or a sublist).
If the rule checks the value against the list (which includes the former but can't be as precise in the message if the former occurs) dt is initialized to the In.value() and then iterates through the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the instanceof check using plan.forEachExpression(method, In.class) so for Filter you can do
filterPlan.condition().forEachExpression(validateIn, In.class)
validateIn(In in) { ...}
the Set<Failure> can be passed to the closure directly - see checkGroupBy & co for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thx for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if there's only one list no need to separate, concatenate things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolvables.resolved(list).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm cannot use that. Pipe is not Resolvable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing Pipe to implement Resolvable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think left and right are a bit confusing - why not use value and list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SqlIllegalArgumentException instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy pasted from here: https://github.com/elastic/elasticsearch/pull/34581/files/ab7c1502d1b6341b2b3cc3be7eda72b00e606ada#diff-5d7529d13a2ef47d436ea2aa577e0c52R543 :-( Fixing both!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is AbstractBuilderTestCase used anywhere (maybe I'm missing it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it for the shardContext here: https://github.com/elastic/elasticsearch/pull/34581/files/ab7c1502d1b6341b2b3cc3be7eda72b00e606ada#diff-aef2b0ce456b8fdd5cc09d6cfd55f0c2R173.
I tried to manually mock it but ended up with ugly code. The AbstractBuilderTestCase is parent class for many tests in .search.aggregations and .index.query packages.
|
By the way, this should go in 6.5 as well. |
The setting that reduces the disk space requirement for the forecasting integration tests was accidentally removed in elastic#31757 when files were moved around. This change simply adds back the setting that existed before that.
Applies our standard column wrapping to the `discovery-ec2` and `repository-s3` plugins.
Changes wording in the FIPS 140-2 related documentation. Co-authored-by: derickson <[email protected]>
Adds support for query-time formatting of the date histo keys when executing a rollup search. Closes elastic#34391
ab7c150 to
4cecd50
Compare
costin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I've added some minor comments regarding styling.
Also, it's worth adding two unit tests : one to check the optimizer does folding of the in expressions (see optimizer tests - something like 1 in (2-1, 2, 3), it should return true and another to see whether In removes duplicates 1 in (1,2,3,1,2,3,1,2,3) which is handled by passing the list through an insertion-order set. From there on it can be treated as a list, knowing there are no duplicates.
|
|
||
| static QueryTranslation toQuery(Expression e, boolean onAggs) { | ||
| QueryTranslation translation = null; | ||
| QueryTranslation translation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a formatting issue - I'd keep the explicit initialization (to keep introspection tools at bay that yes, null is expected).
| public TermsQuery(Location location, String term, List<Expression> values) { | ||
| super(location); | ||
| this.term = term; | ||
| this.values = values.stream().map(Expression::fold).collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the collector could be a set (as oppose to a list) to remove duplicates.
| * Comparison utilities. | ||
| */ | ||
| abstract class Comparisons { | ||
| public final class Comparisons { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick, I tend to explicitly use valueOf/xxxValue to clarify the use of boxing.
| @Override | ||
| public boolean foldable() { | ||
| return foldable; | ||
| return children().stream().allMatch(Expression::foldable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved to Expressions#foldable similar to nullable and resolvable.
|
@costin Addressed comments. Please take another look. If the whole Making the |
| public TermsQuery(Location location, String term, List<Expression> values) { | ||
| super(location); | ||
| this.term = term; | ||
| this.values = values.stream().map(Expression::fold).collect(Collectors.toCollection(LinkedHashSet::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, use Foldables.*: new LinkedHashSet(Foldables.valuesOf(values, datatType()))
costin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
The optimization rule is not critical since the end result is the same though I'm curious why it doesn't kick in.
21dac9d to
d1e1018
Compare
|
@costin as discussed, I was wrong the optimisation kicks in, it was just not tested properly, I now have a test for that. Thank you! |
|
retest this please |
Implement the functionality to translate the `field IN (value1, value2,...)` expressions to proper Lucene queries or painless script or local processors depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: #32955
|
Backported to |
* master: (24 commits) ingest: better support for conditionals with simulate?verbose (elastic#34155) [Rollup] Job deletion should be invoked on the allocated task (elastic#34574) [DOCS] .Security index is never auto created (elastic#34589) CCR: Requires soft-deletes on the follower (elastic#34725) re-enable bwc tests (elastic#34743) Empty GetAliases authorization fix (elastic#34444) INGEST: Document Processor Conditional (elastic#33388) [CCR] Add total fetch time leader stat (elastic#34577) SQL: Support pattern against compatible indices (elastic#34718) [CCR] Auto follow pattern APIs adjustments (elastic#34518) [Test] Remove dead code from ExceptionSerializationTests (elastic#34713) A small typo in migration-assistance doc (elastic#34704) ingest: processor stats (elastic#34724) SQL: Implement IN(value1, value2, ...) expression. (elastic#34581) Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273) INGEST: Rename Pipeline Processor Param. (elastic#34733) Core: Move IndexNameExpressionResolver to java time (elastic#34507) [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882) TESTING.asciidoc fix examples using forbidden annotation (elastic#34515) SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660) ...
Implement the functionality to translate the `field IN (value1, value2,...)` expressions to proper Lucene queries or painless script or local processors depending on the use case. The `IN` expression can be used in SELECT, WHERE and HAVING clauses. Closes: #32955
Implement the functionality to translate the
`field IN (value1, value2,...) expressions to proper Lucene queries
or painless script depending on the use case.
The
INexpression can be used in SELECT, WHERE and HAVING clauses.Closes: #32955