Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@

import java.io.IOException;
import java.time.ZoneId;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Objects;
Expand Down Expand Up @@ -81,13 +83,66 @@ public Object extract(SearchHit hit) {
Object value = null;
DocumentField field = null;
if (hitName != null) {
// a nested field value is grouped under the nested parent name (ie dep.dep_name lives under "dep":[{dep_name:value}])
field = hit.field(hitName);
value = unwrapFieldsMultiValue(extractNestedField(hit));
} else {
field = hit.field(fieldName);
if (field != null) {
value = unwrapFieldsMultiValue(field.getValues());
}
Comment on lines +89 to +91
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be common for both branches (needs changes to extractNestedField) to make it more consistent. Something like:

field = hitName != null  ? extractNestedField(hit) : hit.field(fieldName);
if (field != null) {
    value = unwrapFieldsMultiValue(fields.getValue());
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractNestedField cannot consistently return a DocumentField instance so that we can call allways field.getValues() in extract(SearchHit) method. The data structure inside a DocumentField is more complex in the case of nested fields and it requires further exploration before obtaining an equivalent of field.getValues() (see https://github.com/elastic/elasticsearch/pull/69368/files#diff-099b1feb3ed943ecc359be919a1dd67423938b9b8252e43c9ddd0a72027c1c87R120-R143).

}
return value;
}

/*
* For a path of fields like root.nested1.nested2.leaf where nested1 and nested2 are nested field types,
* fieldName is root.nested1.nested2.leaf, while hitName is root.nested1.nested2
* We first look for root.nested1.nested2 or root.nested1 or root in the SearchHit until we find something.
* If the DocumentField lives under "root.nested1" the remaining path to search for (in the DocumentField itself) is nested2.
* After this step is done, what remains to be done is just getting the leaf values.
*/
@SuppressWarnings("unchecked")
private Object extractNestedField(SearchHit hit) {
Object value;
DocumentField field;
String tempHitname = hitName;
List<String> remainingPath = new ArrayList<>();
// first, search for the "root" DocumentField under which the remaining path of nested document values is
while ((field = hit.field(tempHitname)) == null) {
int indexOfDot = tempHitname.lastIndexOf(".");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to return a non-negative?
Actually, with a mapping similar to the exemplified one in the test's comment (but only using the first lowered letter of the randomly generated nodes), indexing just {"h": [{"f":{"b": {"a": "true"}}}]} and querying with SELECT h.f.o FROM test seems to trigger:

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 1
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734)
	at java.base/java.lang.String.substring(String.java:1903)
	at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extractNestedField(AbstractFieldHitExtractor.java:113)
	at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extract(AbstractFieldHitExtractor.java:86)
	at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:129)
	at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:32)
	at org.elasticsearch.xpack.sql.execution.search.ResultRowSet.getColumn(ResultRowSet.java:38)
	at org.elasticsearch.xpack.sql.session.AbstractRowSet.column(AbstractRowSet.java:19)
	at org.elasticsearch.xpack.sql.session.RowView.forEachColumn(RowView.java:39)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$createResponse$5(TransportSqlQueryAction.java:132)
	at org.elasticsearch.xpack.sql.session.RowSet.forEachRow(RowSet.java:28)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:130)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponseWithSchema(TransportSqlQueryAction.java:125)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$operation$0(TransportSqlQueryAction.java:97)
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:139)

Copy link
Contributor

@bpintea bpintea Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning it here, but happy to open an issue if better (or just drop it, if wrong): selecting h.f.b.a will still return nothing.
Reason being that the inner_hits response from ES comes with with the entries listed as h.f_12 (for example), while we're looking for the inner hit h.f.b in SearchHitRowSet. Seems like this should be doable to work, but didn't look deeper.
This is not new, this didn't work before either (although no error message is returned, as mentioned in the limitations).

Copy link
Contributor Author

@astefan astefan Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! That's an older problem - SELECT h.f.b.a where b.a is a hierarchy of simple fields inside another hierarchy of nested fields -, for sure. It's the complexity of the nested documents definition possibilities that shows that our algorithm isn't covering all of them. I'd be inclined in dealing with this bug in a subsequent PR as it doesn't look straightforward and it may need a bit of rethinking the way we deal with nested fields hierarchies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to return a non-negative?
Actually, with a mapping similar to the exemplified one in the test's comment (but only using the first lowered letter of the randomly generated nodes), indexing just {"h": [{"f":{"b": {"a": "true"}}}]} and querying with SELECT h.f.o FROM test seems to trigger:

Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 1
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3734)
	at java.base/java.lang.String.substring(String.java:1903)
	at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extractNestedField(AbstractFieldHitExtractor.java:113)
	at org.elasticsearch.xpack.ql.execution.search.extractor.AbstractFieldHitExtractor.extract(AbstractFieldHitExtractor.java:86)
	at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:129)
	at org.elasticsearch.xpack.sql.execution.search.SearchHitRowSet.extractValue(SearchHitRowSet.java:32)
	at org.elasticsearch.xpack.sql.execution.search.ResultRowSet.getColumn(ResultRowSet.java:38)
	at org.elasticsearch.xpack.sql.session.AbstractRowSet.column(AbstractRowSet.java:19)
	at org.elasticsearch.xpack.sql.session.RowView.forEachColumn(RowView.java:39)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$createResponse$5(TransportSqlQueryAction.java:132)
	at org.elasticsearch.xpack.sql.session.RowSet.forEachRow(RowSet.java:28)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponse(TransportSqlQueryAction.java:130)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.createResponseWithSchema(TransportSqlQueryAction.java:125)
	at org.elasticsearch.xpack.sql.plugin.TransportSqlQueryAction.lambda$operation$0(TransportSqlQueryAction.java:97)
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:139)

We should prevent the error from occurring - looking at the stacktrace this seems to have been addressed yet I'm mentioning to double check this is indeed the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bpintea this change should fix the h.f.b.a returning nothing example you mentioned. Test for this exact case here.

if (indexOfDot < 0) {// there is no such field in the hit
return null;
}
remainingPath.add(0, tempHitname.substring(indexOfDot + 1));
tempHitname = tempHitname.substring(0, indexOfDot);
}
if (field != null) {
value = unwrapFieldsMultiValue(field.getValues());
// then dig into DocumentField's structure until we reach the field we are interested into
if (remainingPath.size() > 0) {
List<Object> values = field.getValues();
Iterator<String> pathIterator = remainingPath.iterator();
while (pathIterator.hasNext()) {
String pathElement = pathIterator.next();
Map<String, List<Object>> elements = (Map<String, List<Object>>) values.get(0);
values = elements.get(pathElement);
/*
* if this path is not found it means we hit another nested document (inner_root_1.inner_root_2.nested_field_2)
* something like this
* "root_field_1.root_field_2.nested_field_1" : [
* {
* "inner_root_1.inner_root_2.nested_field_2" : [
* {
* "leaf_field" : [
* "abc2"
* ]
* So, start re-building the path until the right one is found, ie inner_root_1.inner_root_2......
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you were looking for the DocumentField above you were going from longest to shortest path. Here you go the other way around. Any reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SearchHit structure for a field (as in asking for SearchHit.field(fieldName)) looks like a series of Lists and Maps: List of Maps where maps have as key a string (the name of a field/subfield) and the value can be another set of Lists of Maps.

If we have to look for a field called a.b.c.d.e (this is represented as AbstractFieldHitExtractor.fieldName) and this field sits under a nested data structure (and implicitly has AbstractFieldHitExtractor.hitName != null) with hitName = a.b.c, the algorithm searches for a.b.c (SearchHit.field("a.b.c")), if this is null then we search for SearchHit.field("a.b") and if still not found we search for SearchHit.field("a"). When we find a DocumentField, we take its values and we start searching in its List of Maps for the rest of the field.
If we found a DocumentField for a then we still need to search for the rest of the field name path, which now is b.c.d.e. So, we look in values for b, then for b.c and so on until we find an element. If we find b.c then the rest of the path becomes d.e and we repeat the process.

*/
while (values == null) {
pathElement += "." + pathIterator.next();
values = elements.get(pathElement);
}
}
value = ((Map<String, Object>) values.get(0)).get(fieldName.substring(hitName.length() + 1));
} else {
value = field.getValues();
}
return value;
}
Expand Down
Loading