-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Synthetic _source: support ignore_above #89466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synthetic _source: support ignore_above #89466
Conversation
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Hi @nik9000, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search (Team:Search) |
This allows you to use `ignore_above` with `keyword` fields in synthetic source. Ignored values are stored in a "backup" stored field and added to the end of the list of results. This makes `ignore_above` work pretty much the same way as it does when you don't have synthetic source. The only difference is the order of the results. But synthetic source changes the order of results anyway. That should be fine.
| query: | ||
| ids: | ||
| values: [1] | ||
| - is_false: hits.hits.0.fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a different bug here that I'm fixing along side ignore_above but I only noticed it when I was working on this. Previously we were adding all stored fields we loaded to the fetched field list. Now we only add if you ask for them.
| return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true); | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); | ||
| return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixes that leaking fields in the search response. It's tricky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaky. Nice catch.
| MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey()); | ||
| if (fieldType == null) { | ||
| continue; // TODO this is lame | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like adding such looseness. I'm sure there is a better way than this, but I couldn't think of something off the bat and figured we'd be refactoring here before too long anyway. So I wanted to see what other folks think.
I don't particularly like that this valueForDisplay thing is mixed up into the Lucene loading. That feels like it should be something we take care of externally but I have no idea how mixed up it is with other things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking. I don't want us to start relying on this null-ok behavior in other places. I don't particularly want to build some fancy abstraction for this either. Any ideas on something a little more defensive than this but not super overkill if we're going to rip it out?
|
Note for those following along at home - this doesn't really effect the performance of loading synthetic source any - even if there are many, many fields, all of which have |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this makes sense. I have a question around what to do if the field is already being stored.
| context.addIgnoredField(name()); | ||
| if (context.isSyntheticSource()) { | ||
| // Save a copy of the field so synthetic source can load it | ||
| context.doc().add(new Field(originalName(), value, ORIGINAL_FIELD_TYPE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just add a new StoredField here, no need to create a new FieldType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If stored=true is set on here then we store it twice, right? Is it worth detecting that case and deferring to the 'normal' stored field rather than always using the hidden field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I did that then fields longer than the limit would start being loadable via stored fields. That feels like something folks could accidentally rely on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear - I don't think we store it twice with the code here.
| } | ||
|
|
||
| private interface Values { | ||
| private interface DocValues { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocumentValues so that we don't get the name collision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that they wrap the doc values interface. Not sure if DocumentValues is as descriptive. DocValuesValues or ColumnarValues or something?
| MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey()); | ||
| if (fieldType == null) { | ||
| continue; // TODO this is lame | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.
| return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true); | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); | ||
| return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sneaky. Nice catch.
Adds more tests for the enrich processor around different index types. Right now they all work fine (yay!) but this feels like a good amount of paranoia.
|
Ready for you again @romseygeek ! |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| - match: | ||
| _source: | ||
| kwd: foo | ||
| # - is_false: fields TODO fix me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something separate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same problem as we have on fetch! I just noticed it and didn't want to forget it. I thought maybe I'd fix it like I did the fetch one. But it was a little more complex so I didn't. But I'll grab it soon!
| if (loadSource) { | ||
| if (false == sourceLoader.requiredStoredFields().isEmpty()) { | ||
| // add the stored fields needed to load the source mapping to an empty set so they aren't returned | ||
| sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh sneaky. Yes, we should definitely make this less hacky...
This allows you to use
ignore_abovewithkeywordfields in syntheticsource. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes
ignore_abovework prettymuch the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.