-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Pass SearchLookup supplier through to fielddataBuilder #61430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass SearchLookup supplier through to fielddataBuilder #61430
Conversation
Runtime fields look up other fields through `SearchLookup`. With this change we add a `Supplier<SearchLookup>` argument to the existing `MappedFieldType#fielddataBuilder` method so that fielddata implementations can have `SearchLookup` available when needed. Note that this commit does not introduce any production implementation of runtime fields, but rather the plumbing needed to then merge the feature branch where runtime fields have been developed. Additionally the use of runtime fields in index sorting is disallowed. Relates to elastic#59332
This is needed for scripted runtime fields, as they allow to refer to other runtime fields. Also reject resolving fields that depend on a defined limit, which is deterined at 32.
|
Pinging @elastic/es-search (:Search/Mapping) |
| fieldData = fieldDataLookup.apply(ft); | ||
| } catch (Exception e) { | ||
| throw new IllegalArgumentException("docvalues not found for index sort field:[" + sortSpec.field + "]"); | ||
| throw new IllegalArgumentException("docvalues not found for index sort field:[" + sortSpec.field + "]", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not strictly required in this PR, though it came natural as I added isRuntimeField to MappedFieldType. and configuring index sorting on runtime fields looked like a bad idea to me.
| */ | ||
| public <IFD extends IndexFieldData<?>> IFD getForField(MappedFieldType fieldType) { | ||
| return getForField(fieldType, index().getName()); | ||
| assert fieldType.isRuntimeField() == false : "runtime fields are not supported"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the main reason why I introduced isRuntimeField as part of this PR. I want to enforce that this method is never called for runtime fields. That is the case today, this is only used for index warmers and index sorting, but we need to make sure that that is still the case in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is needed, index sort will check that the field has doc values so it will fail at the Lucene level. I am not saying we shouldn't check at the ES level but this assert is confusing IMO. Can we remove this function entirely and replace it with the one below ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, this is to have a proper error message. But what I am after here is to protect from this method being a second getForField which though can't support runtime fields (we have no search lookup!). It is so easy to use this one instead of the other one. Maybe renaming both would also help.
Can we remove this function entirely and replace it with the one below?
you mean having a single getForField method I assume? That would be great but I am not sure where to get search lookup for this usecase that does not need it and it's only used for index warmers and index sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the getForField(MappedFieldType). Callers now call the other getForField and provide their own search lookup supplier and index name. That removed the need for the isRuntimeField flag too. index sorting already throws a decent error when it is configured with a runtime field (doc values not found for field etc.). I am happier with this as it simplifies things.
| } | ||
| return fieldDataLookup.apply(fieldType); | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nik9000 this is very similar to what you had in #60318 ( I will give you credit in the final commit). The main difference is that I don't recreate a SearchLookup and DocLookup every time a new field is referenced. I rather fork SearchLookup once a field is looked up and keep on reusing the same and adding field names to that same list. I find it more intuitive this way. I guess you may have a different opinion. I also tried to enclose the tracking in the lookup function. All of the other changes turned out the same as what you made, although I started from scratch to see where I ended up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a field depends on many fields but doesn't have a "chain" of dependencies will this trigger? Is that what we want? Maybe it is, but I think a limit of 5 is pretty low in that case. Maybe fine, but I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great point, it was a bug in my implementation, I don't think it should trigger without a chain. We may want to have a limit later also on number of direct dependencies for a single field, but not now. That should also make the 5 limit reasonable, as it is applied only as depth (also renamed the constant to clarify that).
jimczi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to field data looks good to me. I left two questions regarding the detection of runtime fields and the hard limit on the recursion.
| public class SearchLookup { | ||
|
|
||
| final DocLookup docMap; | ||
| private static final int MAX_FIELD_CHAIN = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 feels big to me ;). I think we can be more aggressive here, something like 4-5 should be more than enough ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with me, I totally expected some discussion on this limit. maybe 4-5 is a bit on the low side, shall we do 10? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am maybe be too cautious here but I cannot think of a good example of a chain that would require more than 2-3 indirections. Allowing for more looks like an anti-pattern to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being cautious is not a bad idea, let's start with 5 then and see if users complain.
| /** | ||
| * Return true for field types that are runtime fields implementation, false otherwise | ||
| */ | ||
| public boolean isRuntimeField() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is the right way to do since we may forgot to override this function where needed ?
Could we instead rely on some inheritance, runtime field type must extend a specific class to be considered runtime ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that runtime fields will have a common base class that overrides this method. This method is basically here to prevent having to perform instanceof checks to figure out whether something is a runtime field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fine with me if we have a long-term solution in the main PR
| mappedFieldType -> indexFieldDataService.apply(mappedFieldType, fullyQualifiedIndex.getName())); | ||
| lookup = new SearchLookup( | ||
| getMapperService(), | ||
| (fieldType, lookup) -> indexFieldDataService.apply(fieldType, fullyQualifiedIndex.getName(), lookup)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shadowing the lookup name here makes this harder to read. I probably had this on my version of the PR too, but could you rename lookup to l or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
| } | ||
| return fieldDataLookup.apply(fieldType); | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a field depends on many fields but doesn't have a "chain" of dependencies will this trigger? Is that what we want? Maybe it is, but I think a limit of 5 is pretty low in that case. Maybe fine, but I dunno.
|
run elasticsearch-ci/packaging-sample-windows |
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for making those tests so much shorter! It is much easier to read now.
also remove the notion of runtime field from mapped field type, it was needed only for index sorting, but it already throws a good error "no doc values for field" when trying to config index sorting on a runtime field. It needs no special handling.
👍 |
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to elastic#59332 Co-authored-by: Nik Everett <[email protected]>
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <[email protected]>
Runtime fields need to have a
SearchLookupavailable, when building their fielddata implementations, so that they can look up other fields, runtime or not.To achieve that, we add a
Supplier<SearchLookup>argument to the existingMappedFieldType#fielddataBuildermethod.As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.
With this PR we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.
Note that this PR does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.
This is a breaking change for
MapperPlugins that plug in a mapper, as the signature ofMappedFieldType#fielddataBuilderchanges from taking a single argument (the index name), to also accept aSupplier<SearchLookup>.Relates to #59332