QL: complex nested field types mappings fix #69368

astefan · 2021-02-22T16:50:09Z

After the changes in #67432 and #68802, the way nested fields are extracted from the ES query results changed significantly. This PR fixes the more complex multi-nested mappings fields extraction.

Fixes #69175.

field type

…o temp_nested_fields_bugfix

elasticmachine · 2021-02-22T16:50:13Z

Pinging @elastic/es-ql (Team:QL)

astefan · 2021-02-22T18:04:06Z

@elasticmachine update branch

bpintea

A first round of comments.

bpintea · 2021-02-24T08:01:05Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+    }
+
+    /*
+     * For a tree of fields like root.nested1.nested2.leaf where nested1 and nested2 are nested field types,


nit: would "a path of fields" make more sense? (since I guess the doc isn't parsed/accessed as a tree?)

bpintea · 2021-02-24T08:01:08Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+        String tempHitname = hitName;
+        List<String> remainingPath = new ArrayList<>();
+        // first, search for the "root" DocumentField under which the remaining path of nested document values is
+        while((field = hit.field(tempHitname)) == null) {


nit: spacing after while.

bpintea · 2021-02-24T08:01:11Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+        List<String> remainingPath = new ArrayList<>();
+        // first, search for the "root" DocumentField under which the remaining path of nested document values is
+        while((field = hit.field(tempHitname)) == null) {
+            int indexOfDot = tempHitname.lastIndexOf(".");


Is this guaranteed to return a non-negative?
Actually, with a mapping similar to the exemplified one in the test's comment (but only using the first lowered letter of the randomly generated nodes), indexing just {"h": [{"f":{"b": {"a": "true"}}}]} and querying with SELECT h.f.o FROM test seems to trigger:

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 1 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734) at java.base/java.lang.String.substring(String.java:1903) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extractNestedField(AbstractFieldHitExtractor.java:113) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extract(AbstractFieldHitExtractor.java:86) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:129) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:32) at org.elasticsearch.xpack.sql.execution.search.ResultRowSet.getColumn(ResultRowSet.java:38) at org.elasticsearch.xpack.sql.session.AbstractRowSet.column(AbstractRowSet.java:19) at org.elasticsearch.xpack.sql.session.RowView.forEachColumn(RowView.java:39) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$createResponse$5(TransportSqlQueryAction.java:132) at org.elasticsearch.xpack.sql.session.RowSet.forEachRow(RowSet.java:28) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:130) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponseWithSchema(TransportSqlQueryAction.java:125) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$operation$0(TransportSqlQueryAction.java:97) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:139)

Mentioning it here, but happy to open an issue if better (or just drop it, if wrong): selecting h.f.b.a will still return nothing.
Reason being that the inner_hits response from ES comes with with the entries listed as h.f_12 (for example), while we're looking for the inner hit h.f.b in SearchHitRowSet. Seems like this should be doable to work, but didn't look deeper.
This is not new, this didn't work before either (although no error message is returned, as mentioned in the limitations).

Nice catch! That's an older problem - SELECT h.f.b.a where b.a is a hierarchy of simple fields inside another hierarchy of nested fields -, for sure. It's the complexity of the nested documents definition possibilities that shows that our algorithm isn't covering all of them. I'd be inclined in dealing with this bug in a subsequent PR as it doesn't look straightforward and it may need a bit of rethinking the way we deal with nested fields hierarchies.

Is this guaranteed to return a non-negative?
Actually, with a mapping similar to the exemplified one in the test's comment (but only using the first lowered letter of the randomly generated nodes), indexing just {"h": [{"f":{"b": {"a": "true"}}}]} and querying with SELECT h.f.o FROM test seems to trigger:

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 1 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734) at java.base/java.lang.String.substring(String.java:1903) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extractNestedField(AbstractFieldHitExtractor.java:113) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extract(AbstractFieldHitExtractor.java:86) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:129) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:32) at org.elasticsearch.xpack.sql.execution.search.ResultRowSet.getColumn(ResultRowSet.java:38) at org.elasticsearch.xpack.sql.session.AbstractRowSet.column(AbstractRowSet.java:19) at org.elasticsearch.xpack.sql.session.RowView.forEachColumn(RowView.java:39) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$createResponse$5(TransportSqlQueryAction.java:132) at org.elasticsearch.xpack.sql.session.RowSet.forEachRow(RowSet.java:28) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:130) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponseWithSchema(TransportSqlQueryAction.java:125) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$operation$0(TransportSqlQueryAction.java:97) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:139)

We should prevent the error from occurring - looking at the stacktrace this seems to have been addressed yet I'm mentioning to double check this is indeed the case.

@bpintea this change should fix the h.f.b.a returning nothing example you mentioned. Test for this exact case here.

bpintea · 2021-02-24T08:01:32Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+        if (remainingPath.size() > 0) {
+            List<Object> values = field.getValues();
+            Iterator<String> pathIterator = remainingPath.iterator();
+            while(pathIterator.hasNext()) {


nit: spacing.

bpintea · 2021-02-24T08:01:41Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+            Iterator<String> pathIterator = remainingPath.iterator();
+            while(pathIterator.hasNext()) {
+                String pathElement = pathIterator.next();
+                Map<String, Object> elements = (Map<String, Object>) values.get(0);


Here (and when calling unwrapFieldsMultiValue below), only a first value is selected. Could the looked for path not be accessed through a subsequent element?

matriv

Apart from the great randomised test, Imho we should also have some plain tests that target specific cases, as for example those mentioned already by @bpintea

…o temp_nested_fields_bugfix

astefan · 2021-02-25T10:03:31Z

@matriv thank you for the review. I am not sure how useful it is to have a specific test for that scenario. There could be many more scenarios like that one, ones that we didn't even think about, just because the combination of mappings possible is so complex. While we do have specific tests for specific scenarios in FieldExtractorTestCase, for nested fields the scenarios are not so few also not so simple and I'm reluctant to add a specific case that couldn't be covered by a randomization algorithm.

@bpintea spotted one scenario where the value of the requested field is null and I've adjusted the randomized test to account for that. He also noticed another scenario that didn't ever work, where a tree of mappings is nested inside a tree of nested fields. Since the fix is not obvious for the moment, I'd suggest dealing with that fix in a separate GH issue where tests should be added as well. If a specific test is proven to be useful in that case vs. a randomized one, I'd let this to be decided then.

cbuescher

Sorry that I didn't go through all the details of the extraction implementation yet, its a bit different to what we do on the fields side of things where we deal with the _source map. I left a question around testing, I tend to agree with @matriv that in addition to the randomized testing it would be great to have a few common cases spelled out explicitely without too much randomness to get a basic coverage of those. But I don't know how well this fits with the general testing philosophy and infra in ql, so I leave this only as a remark.

cbuescher · 2021-02-25T11:05:11Z

...lugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java

+                    }
+                    index.endObject();
+                    // from time to time set a null value instead of an actual value
+                    if (randomBoolean()) {


maybe use null less often than 50% of the times in randomization here, it should be the exception I think. Using ' rarely()` this case would still get picked 1% of the time, should be enough?

cbuescher · 2021-02-25T11:15:57Z

...lugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java

+                        bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":null");
+                    } else {
+                        randomValue.set(randomAlphaOfLength(10));
+                        bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":\"" + randomValue.get() + "\"");


I'm not sure I understand this entirely, but to me it seems the test will only add one value for the leaf field thats going to be requested and tested for here? Asking because its very common that there is more than one entry in a nested field, e.g. in a very simple case

"fields": { "group" : ["fans"], "user": [{ "first": ["John"], "last": ["Smith"], }, { "first": ["Alice"], "last": ["White"], } ] }

where user is the nested field, and you want to retrieve user.first. I'm not sure this case is tested for, but I expect it to be common. Maybe less common but still relevant would be something like

"fields": { "group" : ["fans"], "user": [{ "first": ["John", "Jim", "James"], "last": ["Smith"], }, { "first": ["Alice"], "last": ["White"], } ] }

where I would expect probably expect user.first to extract a list or array of four values.
Maybe this is done in AbstractFieldHitExtractor#unwrapFieldsMultiValue but I think it would be great to have this explicitely covered by tests as well.

palesz

Couple of minor comments on top of the already existing reviews.

palesz · 2021-02-24T17:32:06Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+                Map<String, Object> elements = (Map<String, Object>) values.get(0);
+                values = (List<Object>) elements.get(pathElement);


Minor: if you use Map<String, List<Object>> elements, you won't have to cast on elements.get()

palesz · 2021-02-24T23:13:43Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+                 *           "leaf_field" : [
+                 *             "abc2"
+                 *           ]
+                 * So, start re-building the path until the right one is found, ie inner_root_1.inner_root_2......


When you were looking for the DocumentField above you were going from longest to shortest path. Here you go the other way around. Any reason?

The SearchHit structure for a field (as in asking for SearchHit.field(fieldName)) looks like a series of Lists and Maps: List of Maps where maps have as key a string (the name of a field/subfield) and the value can be another set of Lists of Maps.

If we have to look for a field called a.b.c.d.e (this is represented as AbstractFieldHitExtractor.fieldName) and this field sits under a nested data structure (and implicitly has AbstractFieldHitExtractor.hitName != null) with hitName = a.b.c, the algorithm searches for a.b.c (SearchHit.field("a.b.c")), if this is null then we search for SearchHit.field("a.b") and if still not found we search for SearchHit.field("a"). When we find a DocumentField, we take its values and we start searching in its List of Maps for the rest of the field.
If we found a DocumentField for a then we still need to search for the rest of the field name path, which now is b.c.d.e. So, we look in values for b, then for b.c and so on until we find an element. If we find b.c then the rest of the path becomes d.e and we repeat the process.

palesz · 2021-02-25T15:05:25Z

...lugin/sql/qa/server/src/main/java/org/elasticsearch/xpack/sql/qa/FieldExtractorTestCase.java

+                    if (randomBoolean()) {
+                        randomValue.set(null);
+                        bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":null");
+                    } else {
+                        randomValue.set(randomAlphaOfLength(10));
+                        bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":\"" + randomValue.get() + "\"");
+                    }


Suggested change

if (randomBoolean()) {

randomValue.set(null);

bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":null");

} else {

randomValue.set(randomAlphaOfLength(10));

bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":\"" + randomValue.get() + "\"");

}

randomValue.set(randomBoolean() ? null : randomAlphaOfLength(10));

bulkContent.append("\"" + fieldName + "\":{\"" + leafFieldName + "\":\"" + randomValue.get() + "\"");

There is tiny difference between my approach and yours: when setting a null value in json the json looks like this "field_name": null. Your suggestion leads to a json looking like "field_name": "null" which is the string null, not a null value. I know the if else approach is a more verbose one , but otherwise the string being built inside bulkContent would have looked messy and hard to follow.

Yep, missed that. Fair point.

costin

LGTM - it's unfortunate the reading of nested docs is messy and there are no built-in utilities that we can leverage and instead need to reconstruct the hierarchy ourselves through strings which are error prone.

costin · 2021-03-01T10:16:41Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+            value = unwrapFieldsMultiValue(((Map<String, Object>) values.get(0)).get(fieldName.substring(hitName.length() + 1)));
+        } else {


Since the outer code also calls unwrapFieldMultiValue, how about returning its argument instead:

value = ((Map<String, Object>) values.get(0)).get(fieldName.substring(hitName.length() + 1));

costin · 2021-03-01T10:19:22Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+            if (field != null) {
+                value = unwrapFieldsMultiValue(field.getValues());
+            }


This should be common for both branches (needs changes to extractNestedField) to make it more consistent. Something like:

field = hitName != null ? extractNestedField(hit) : hit.field(fieldName); if (field != null) { value = unwrapFieldsMultiValue(fields.getValue()); }

extractNestedField cannot consistently return a DocumentField instance so that we can call allways field.getValues() in extract(SearchHit) method. The data structure inside a DocumentField is more complex in the case of nested fields and it requires further exploration before obtaining an equivalent of field.getValues() (see https://github.com/elastic/elasticsearch/pull/69368/files#diff-099b1feb3ed943ecc359be919a1dd67423938b9b8252e43c9ddd0a72027c1c87R120-R143).

costin · 2021-03-01T10:23:18Z

...in/java/org/elasticsearch/xpack/ql/execution/search/extractor/AbstractFieldHitExtractor.java

+        List<String> remainingPath = new ArrayList<>();
+        // first, search for the "root" DocumentField under which the remaining path of nested document values is
+        while((field = hit.field(tempHitname)) == null) {
+            int indexOfDot = tempHitname.lastIndexOf(".");


Is this guaranteed to return a non-negative?
Actually, with a mapping similar to the exemplified one in the test's comment (but only using the first lowered letter of the randomly generated nodes), indexing just {"h": [{"f":{"b": {"a": "true"}}}]} and querying with SELECT h.f.o FROM test seems to trigger:

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 1 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734) at java.base/java.lang.String.substring(String.java:1903) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extractNestedField(AbstractFieldHitExtractor.java:113) at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extract(AbstractFieldHitExtractor.java:86) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:129) at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:32) at org.elasticsearch.xpack.sql.execution.search.ResultRowSet.getColumn(ResultRowSet.java:38) at org.elasticsearch.xpack.sql.session.AbstractRowSet.column(AbstractRowSet.java:19) at org.elasticsearch.xpack.sql.session.RowView.forEachColumn(RowView.java:39) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$createResponse$5(TransportSqlQueryAction.java:132) at org.elasticsearch.xpack.sql.session.RowSet.forEachRow(RowSet.java:28) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:130) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponseWithSchema(TransportSqlQueryAction.java:125) at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$operation$0(TransportSqlQueryAction.java:97) at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:139)

We should prevent the error from occurring - looking at the stacktrace this seems to have been addressed yet I'm mentioning to double check this is indeed the case.

…o temp_nested_fields_bugfix

bpintea

LGTM

matriv

LGTM, Thank you!

* Handle a tree of nested fields or a nested field inside a non-nested field type (cherry picked from commit 28b64c6)

astefan added 2 commits February 22, 2021 18:32

Handle a tree of nested fields or a nested field inside a non-nested

ca9fff4

field type

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

803d382

…o temp_nested_fields_bugfix

astefan added >bug :Analytics/SQL SQL querying v8.0.0 :Analytics/EQL EQL querying v7.12.0 v7.13.0 labels Feb 22, 2021

elasticmachine added Team:QL (Deprecated) Meta label for query languages team labels Feb 22, 2021

Spotless check

51947e0

Merge branch 'master' into temp_nested_fields_bugfix

2169a10

astefan requested review from bpintea, costin, matriv and palesz February 22, 2021 19:22

bpintea reviewed Feb 24, 2021

View reviewed changes

matriv reviewed Feb 24, 2021

View reviewed changes

cbuescher self-requested a review February 24, 2021 09:27

astefan added 2 commits February 25, 2021 11:53

Fix

39e6fc6

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

1479727

…o temp_nested_fields_bugfix

cbuescher reviewed Feb 25, 2021

View reviewed changes

palesz reviewed Feb 25, 2021

View reviewed changes

costin approved these changes Mar 1, 2021

View reviewed changes

astefan added 3 commits March 5, 2021 02:26

More tests

6e7222d

Addressing more reviews

694a8de

Merge branch 'master' of https://github.com/elastic/elasticsearch int…

b740d00

…o temp_nested_fields_bugfix

astefan requested review from bpintea, cbuescher, matriv and palesz March 5, 2021 01:24

bpintea approved these changes Mar 15, 2021

View reviewed changes

matriv approved these changes Mar 15, 2021

View reviewed changes

astefan merged commit 28b64c6 into elastic:master Mar 15, 2021

astefan deleted the temp_nested_fields_bugfix branch March 15, 2021 13:29

astefan added a commit to astefan/elasticsearch that referenced this pull request Mar 15, 2021

QL: complex nested field types mappings fix (elastic#69368)

989957a

* Handle a tree of nested fields or a nested field inside a non-nested field type (cherry picked from commit 28b64c6)

astefan added a commit to astefan/elasticsearch that referenced this pull request Mar 15, 2021

QL: complex nested field types mappings fix (elastic#69368)

27ad52b

* Handle a tree of nested fields or a nested field inside a non-nested field type (cherry picked from commit 28b64c6)

astefan added a commit that referenced this pull request Mar 15, 2021

QL: complex nested field types mappings fix (#69368) (#70393)

2b74de9

* Handle a tree of nested fields or a nested field inside a non-nested field type (cherry picked from commit 28b64c6)

astefan added a commit that referenced this pull request Mar 15, 2021

QL: complex nested field types mappings fix (#69368) (#70394)

9a1dc0a

* Handle a tree of nested fields or a nested field inside a non-nested field type (cherry picked from commit 28b64c6)

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

		Map<String, Object> elements = (Map<String, Object>) values.get(0);
		values = (List<Object>) elements.get(pathElement);

		value = unwrapFieldsMultiValue(((Map<String, Object>) values.get(0)).get(fieldName.substring(hitName.length() + 1)));
		} else {

QL: complex nested field types mappings fix #69368

QL: complex nested field types mappings fix #69368

Uh oh!

Conversation

astefan commented Feb 22, 2021

Uh oh!

elasticmachine commented Feb 22, 2021

Uh oh!

astefan commented Feb 22, 2021

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea Feb 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astefan Feb 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

astefan commented Feb 25, 2021

Uh oh!

cbuescher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palesz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

costin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bpintea left a comment

Choose a reason for hiding this comment

Uh oh!

matriv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

bpintea Feb 24, 2021 •

edited

Loading

astefan Feb 25, 2021 •

edited

Loading