Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Aug 18, 2022

This allows you to use ignore_above with keyword fields in synthetic
source. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes ignore_above work pretty
much the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.

@nik9000 nik9000 added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.5.0 labels Aug 18, 2022
@elasticsearchmachine elasticsearchmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Aug 18, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @nik9000, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000 nik9000 requested a review from romseygeek August 18, 2022 14:39
This allows you to use `ignore_above` with `keyword` fields in synthetic
source. Ignored values are stored in a "backup" stored field and added
to the end of the list of results. This makes `ignore_above` work pretty
much the same way as it does when you don't have synthetic source. The
only difference is the order of the results. But synthetic source
changes the order of results anyway. That should be fine.
query:
ids:
values: [1]
- is_false: hits.hits.0.fields
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a different bug here that I'm fixing along side ignore_above but I only noticed it when I was working on this. Previously we were adding all stored fields we loaded to the fetched field list. Now we only add if you ask for them.

return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true);
// add the stored fields needed to load the source mapping to an empty set so they aren't returned
sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));
return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes that leaking fields in the search response. It's tricky!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaky. Nice catch.

MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey());
if (fieldType == null) {
continue; // TODO this is lame
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like adding such looseness. I'm sure there is a better way than this, but I couldn't think of something off the bat and figured we'd be refactoring here before too long anyway. So I wanted to see what other folks think.

I don't particularly like that this valueForDisplay thing is mixed up into the Lucene loading. That feels like it should be something we take care of externally but I have no idea how mixed up it is with other things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking. I don't want us to start relying on this null-ok behavior in other places. I don't particularly want to build some fancy abstraction for this either. Any ideas on something a little more defensive than this but not super overkill if we're going to rip it out?

@nik9000
Copy link
Member Author

nik9000 commented Aug 18, 2022

Note for those following along at home - this doesn't really effect the performance of loading synthetic source any - even if there are many, many fields, all of which have ignore_above. It's in the microseconds to build the extra HashMap entries. If you are worrying about microseconds you want to keep the mapping small.

@nik9000 nik9000 mentioned this pull request Aug 18, 2022
50 tasks
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this makes sense. I have a question around what to do if the field is already being stored.

context.addIgnoredField(name());
if (context.isSyntheticSource()) {
// Save a copy of the field so synthetic source can load it
context.doc().add(new Field(originalName(), value, ORIGINAL_FIELD_TYPE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just add a new StoredField here, no need to create a new FieldType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stored=true is set on here then we store it twice, right? Is it worth detecting that case and deferring to the 'normal' stored field rather than always using the hidden field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I did that then fields longer than the limit would start being loadable via stored fields. That feels like something folks could accidentally rely on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear - I don't think we store it twice with the code here.

}

private interface Values {
private interface DocValues {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DocumentValues so that we don't get the name collision?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that they wrap the doc values interface. Not sure if DocumentValues is as descriptive. DocValuesValues or ColumnarValues or something?

MappedFieldType fieldType = fieldTypeLookup.apply(entry.getKey());
if (fieldType == null) {
continue; // TODO this is lame
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this really should be pulled out elsewhere, but I think I'm fine with it staying in here for the time being.

return new CustomFieldsVisitor(sourceLoader.requiredStoredFields(), true);
// add the stored fields needed to load the source mapping to an empty set so they aren't returned
sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));
return new CustomFieldsVisitor(storedToRequestedFields.keySet(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sneaky. Nice catch.

Adds more tests for the enrich processor around different index types.
Right now they all work fine (yay!) but this feels like a good amount of
paranoia.
@nik9000
Copy link
Member Author

nik9000 commented Aug 31, 2022

Ready for you again @romseygeek !

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- match:
_source:
kwd: foo
# - is_false: fields TODO fix me
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something separate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same problem as we have on fetch! I just noticed it and didn't want to forget it. I thought maybe I'd fix it like I did the fetch one. But it was a little more complex so I didn't. But I'll grab it soon!

if (loadSource) {
if (false == sourceLoader.requiredStoredFields().isEmpty()) {
// add the stored fields needed to load the source mapping to an empty set so they aren't returned
sourceLoader.requiredStoredFields().forEach(fieldName -> storedToRequestedFields.putIfAbsent(fieldName, Set.of()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh sneaky. Yes, we should definitely make this less hacky...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants