Skip to content

Conversation

@matriv
Copy link
Contributor

@matriv matriv commented Oct 23, 2018

Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for IN expressions occuring
in SELECT, WHERE and HAVING clauses.

Closes: #34582

Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: elastic#34582
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the NULL handling inside the values is incorrect since it should signify missing value.
However this can be a separate PR.

if (compResult == null) {
result = null;
}
if (compResult != null && compResult) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compResult is not null already so no need to check it;
if (compResult == null) {..} else if(compResult) { ..}

return valuesOf(list, to, false);
}

public static <T> List<T> valuesOf(List<Expression> list, DataType to, boolean removeNulls) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why nulls need to be removed - if the expression in the list is NULL, it should be return accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment here: #34750 (comment)

Boolean compResult = Comparisons.eq(foldedLeftValue, rightValue.fold());
if (compResult != null && compResult) {
if (compResult == null) {
// if (value().dataType() == DataType.NULL || rightValue.dataType() == DataType.NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comments are confusing (not sure if this is the final version or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, some mixup with git stash.

super(location);
this.term = term;
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType()));
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType(), true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having the overloaded method, it might be more convenient to use Collections.removeIf(p -> p == null) since this is something local to the TermsQuery. On the other hand, it might also imply that when the value is missing (is null), there should be a match.

Copy link
Contributor Author

@matriv matriv Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that only to avoid a second iteration on the list (to remove the nulls), but I can change it. what do you think? I guess it can make a difference only if the list is really long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I wouldn't worry about it. I'm with you regarding avoiding the iteration but again, the Foldable method looks obscure with the null exclusion.
In other words, it's not Foldable that should handle null - I'm fain with handling it inside Terms I wonder though if that has any implications for inline IN.

}

public void testTranslateInExpression_WhereClauseAndNullHAndling() throws IOException {
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', null, 'lala', null, 'foo', concat('la', 'la'))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think null means the value is missing. That is the IN should become a bool query between missing and terms.

Copy link
Contributor Author

@matriv matriv Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to copy the behaviour to Postgres:

test=# select * from t1 where a in (null);
 a
---
(0 rows)

test=# select * from t1 where a is null;
 a
---




(4 rows)

WHERE col IN (null) is different than WHERE col is NULL, the first one evaluates to NULL which in turn becomes false for WHERE and HAVING clauses.

MySQL behaves the same way too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. IN means = and that fails against null (need to use IS (NOT) NULL).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Left just one comment.

return true;
} else if (isString() && other.isString()) {
return true;
} else if (isNumeric() && other.isNumeric()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these conditions that return true wouldn't look better if there is one if () only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it only to be more readable, can combine them in one if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use one if with one condition evaluation per line - best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matriv matriv merged commit 4c73854 into elastic:master Oct 24, 2018
@matriv matriv deleted the mt/fix-34582-new branch October 24, 2018 12:42
matriv pushed a commit that referenced this pull request Oct 24, 2018
Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: #34582
@matriv
Copy link
Contributor Author

matriv commented Oct 24, 2018

Backported to 6.x with 54f3b69

matriv added a commit to matriv/elasticsearch that referenced this pull request Oct 24, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: elastic#34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
kcm pushed a commit that referenced this pull request Oct 30, 2018
Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: #34582
kcm pushed a commit that referenced this pull request Oct 30, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv added a commit to matriv/elasticsearch that referenced this pull request Oct 30, 2018
Replace standard `||` and `==` painless operators with
`or` and `eq` null safe alternatives from `InternalSqlScriptUtils`.

Follow up to elastic#34750
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <[email protected]>
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <[email protected]>
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants