Skip to content

Conversation

@javanna
Copy link
Member

@javanna javanna commented Aug 21, 2020

Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this PR we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this PR does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Runtime fields look up other fields through `SearchLookup`. With this change we add a `Supplier<SearchLookup>` argument to the existing `MappedFieldType#fielddataBuilder` method so that fielddata implementations can have `SearchLookup` available when needed.

Note that this commit does not introduce any production implementation of runtime fields, but rather the plumbing needed to then merge the feature branch where runtime fields have been developed.

Additionally the use of runtime fields in index sorting is disallowed.

Relates to elastic#59332
This is needed for scripted runtime fields, as they allow to refer to other runtime fields.

Also reject resolving fields that depend on a defined limit, which is deterined at 32.
@javanna javanna added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types v8.0.0 v7.10.0 labels Aug 21, 2020
@javanna javanna requested review from jimczi and nik9000 August 21, 2020 16:08
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Mapping)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Aug 21, 2020
fieldData = fieldDataLookup.apply(ft);
} catch (Exception e) {
throw new IllegalArgumentException("docvalues not found for index sort field:[" + sortSpec.field + "]");
throw new IllegalArgumentException("docvalues not found for index sort field:[" + sortSpec.field + "]", e);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not strictly required in this PR, though it came natural as I added isRuntimeField to MappedFieldType. and configuring index sorting on runtime fields looked like a bad idea to me.

*/
public <IFD extends IndexFieldData<?>> IFD getForField(MappedFieldType fieldType) {
return getForField(fieldType, index().getName());
assert fieldType.isRuntimeField() == false : "runtime fields are not supported";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main reason why I introduced isRuntimeField as part of this PR. I want to enforce that this method is never called for runtime fields. That is the case today, this is only used for index warmers and index sorting, but we need to make sure that that is still the case in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is needed, index sort will check that the field has doc values so it will fail at the Lucene level. I am not saying we shouldn't check at the ES level but this assert is confusing IMO. Can we remove this function entirely and replace it with the one below ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, this is to have a proper error message. But what I am after here is to protect from this method being a second getForField which though can't support runtime fields (we have no search lookup!). It is so easy to use this one instead of the other one. Maybe renaming both would also help.

Can we remove this function entirely and replace it with the one below?

you mean having a single getForField method I assume? That would be great but I am not sure where to get search lookup for this usecase that does not need it and it's only used for index warmers and index sorting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the getForField(MappedFieldType). Callers now call the other getForField and provide their own search lookup supplier and index name. That removed the need for the isRuntimeField flag too. index sorting already throws a decent error when it is configured with a runtime field (doc values not found for field etc.). I am happier with this as it simplifies things.

}
return fieldDataLookup.apply(fieldType);
};
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 this is very similar to what you had in #60318 ( I will give you credit in the final commit). The main difference is that I don't recreate a SearchLookup and DocLookup every time a new field is referenced. I rather fork SearchLookup once a field is looked up and keep on reusing the same and adding field names to that same list. I find it more intuitive this way. I guess you may have a different opinion. I also tried to enclose the tracking in the lookup function. All of the other changes turned out the same as what you made, although I started from scratch to see where I ended up :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a field depends on many fields but doesn't have a "chain" of dependencies will this trigger? Is that what we want? Maybe it is, but I think a limit of 5 is pretty low in that case. Maybe fine, but I dunno.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point, it was a bug in my implementation, I don't think it should trigger without a chain. We may want to have a limit later also on number of direct dependencies for a single field, but not now. That should also make the 5 limit reasonable, as it is applied only as depth (also renamed the constant to clarify that).

@javanna javanna mentioned this pull request Aug 21, 2020
30 tasks
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to field data looks good to me. I left two questions regarding the detection of runtime fields and the hard limit on the recursion.

public class SearchLookup {

final DocLookup docMap;
private static final int MAX_FIELD_CHAIN = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 feels big to me ;). I think we can be more aggressive here, something like 4-5 should be more than enough ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine with me, I totally expected some discussion on this limit. maybe 4-5 is a bit on the low side, shall we do 10? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am maybe be too cautious here but I cannot think of a good example of a chain that would require more than 2-3 indirections. Allowing for more looks like an anti-pattern to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being cautious is not a bad idea, let's start with 5 then and see if users complain.

/**
* Return true for field types that are runtime fields implementation, false otherwise
*/
public boolean isRuntimeField() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the right way to do since we may forgot to override this function where needed ?
Could we instead rely on some inheritance, runtime field type must extend a specific class to be considered runtime ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that runtime fields will have a common base class that overrides this method. This method is basically here to prevent having to perform instanceof checks to figure out whether something is a runtime field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fine with me if we have a long-term solution in the main PR

mappedFieldType -> indexFieldDataService.apply(mappedFieldType, fullyQualifiedIndex.getName()));
lookup = new SearchLookup(
getMapperService(),
(fieldType, lookup) -> indexFieldDataService.apply(fieldType, fullyQualifiedIndex.getName(), lookup));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shadowing the lookup name here makes this harder to read. I probably had this on my version of the PR too, but could you rename lookup to l or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

}
return fieldDataLookup.apply(fieldType);
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a field depends on many fields but doesn't have a "chain" of dependencies will this trigger? Is that what we want? Maybe it is, but I think a limit of 5 is pretty low in that case. Maybe fine, but I dunno.

@javanna
Copy link
Member Author

javanna commented Aug 26, 2020

run elasticsearch-ci/packaging-sample-windows

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making those tests so much shorter! It is much easier to read now.

also remove the notion of runtime field from mapped field type, it was needed only for index sorting, but it already throws a good error "no doc values for field" when trying to config index sorting on a runtime field. It needs no special handling.
@nik9000
Copy link
Member

nik9000 commented Aug 26, 2020

do you mind if I look at this again once I merge master (once this PR is merged) into the feature branch? then it will be easier to test what happens with runtime fields and what we should do about that

👍

@javanna javanna merged commit 462e25f into elastic:master Aug 26, 2020
javanna added a commit to javanna/elasticsearch that referenced this pull request Aug 27, 2020
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to elastic#59332

Co-authored-by: Nik Everett <[email protected]>
javanna added a commit that referenced this pull request Aug 27, 2020
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Co-authored-by: Nik Everett <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>breaking-java >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v7.10.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants