-
Notifications
You must be signed in to change notification settings - Fork 25.6k
QL: constant_keyword support #53241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QL: constant_keyword support #53241
Conversation
|
Pinging @elastic/es-search (:Search/SQL) |
|
@elasticmachine run elasticsearch-ci/2 |
matriv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Left some comments.
| // To mute tests follow example in file: example.csv-spec | ||
|
|
||
| // | ||
| // Tests testing field alias (introduced in ES 6.4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover comment?
| protected boolean isFromDocValuesOnly(DataType dataType) { | ||
| return dataType == KEYWORD // because of ignore_above. | ||
| || dataType == DATETIME | ||
| || dataType == CONSTANT_KEYWORD // because a non-existent value is considered the constant value itself |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain please, I don't get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot extract the value of a constant_keyword from _source because if there is no source for the field, Elasticsearch will still consider the field as having the constant value.
Take a look at the example in our docs. No value at indexing time means level is debug. So, for the second document in the sample, there is no _source for level, but the value is still debug in case there is a query filtering on that field.
| hadLastItem = true; | ||
| bulk.append('"').append(titles.get(f)).append("\":\"").append(fields.get(f)).append('"'); | ||
| if (titles.get(f).equals("gender") && extraFields) { | ||
| bulk.append(",\"extra_gender\":\"M\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer it to be in tha employees.csv file and have a different value from gender like Male and Female.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, it's a too impactful change to do it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I mixed up the feature with an enum keyword that can take a set of predefined values in the mapping. It's ok as is, but maybe check if null is also acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe having is a Male instead of M can help avoid confusion in the future regarding tests?
| ; | ||
|
|
||
| aggWithNullFilter | ||
| SELECT COUNT(*) count FROM test_emp_copy WHERE extra_gender IS NOT NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constant_keyword cannot be null, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, I think it can be null... haven't tested this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess any type can be null (missing field in the doc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For null case, all the documents in the index, for that field, shouldn't have a value. As soon as one document is indexed with something in it for that field, all other documents will "inherit" that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ; | ||
|
|
||
| functionOverAlias | ||
| SELECT BIT_LENGTH(extra_gender) bit FROM test_emp_copy ORDER BY extra_gender LIMIT 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have 2 values Male/Female that should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A constant_keyword field has only one value.
| assertEquals(BOOLEAN, commonType(BOOLEAN, BOOLEAN)); | ||
| assertEquals(NULL, commonType(NULL, NULL)); | ||
| assertEquals(INTEGER, commonType(INTEGER, KEYWORD)); | ||
| assertEquals(DOUBLE, commonType(DOUBLE, CONSTANT_KEYWORD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numeric and constant_keyword not tasted in DataTypeConverterTests of QL.
costin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of tests. Left some comments on removal of the types in case of conflict since I think it might remove index information leading to incorrect mapping.
|
|
||
| // if there are both a keyword and a constant_keyword type for this field, only keep the keyword as a common compatible type | ||
| if (hasCompatibleKeywords) { | ||
| types.remove(CONSTANT_KEYWORD.esType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the type all together might remove useful information.
For example if a field a is mapped as keyword in index X and constant_keyword in index Y, the piece above will find the collision and remove the field from Y resulting in a mapping only in X.
On one hand it does indicate that there's only field a however each field has index information underneath so in this case it would be just for index X. I'm not sure though whether we take that into account or not.
What happens is there's no removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scenario is the one where two indices have the same named field with both keyword and constant_keyword types and their mappings need to be merged. For other scenarios (where the same name is used for two fields with different types) we tell the user that the field cannot be shown (because of the different mapping types), but in this very specific case and since these two field types are so similar I chose to display the field with the "common" data type as keyword.
The controversy for what to do in this specific case came from the scenario described here and is specifically tested here.
costin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
x-pack/plugin/ql/src/main/java/org/elasticsearch/xpack/ql/type/DataTypeConverter.java
Show resolved
Hide resolved
bpintea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
matriv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you @astefan!
(cherry picked from commit d6cd4ce)
This PR adds support for
constant_keyworddata type following its addition in Elasticsearch with #49713.Addresses #53016