Skip to content

Conversation

@dimitris-athanasiou
Copy link
Contributor

There is an index level setting called index.max_docvalue_fields_search which limits the number of doc_value fields that can be returned from a search. This commit changes behaviour of the extractor so that it's now reading the value of that setting and if there are more fields we switch to fetching the fields from _source.

@dimitris-athanasiou dimitris-athanasiou added the :ml Machine learning label Jun 13, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this adds bugs for TimeField and GeoPointField extractors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be extractedFields.getDocValueFields().size()?

Copy link
Contributor Author

@dimitris-athanasiou dimitris-athanasiou Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we filter only doc_value fields, so it makes no difference. But I see how it'd be clearer if I made this change. I shall!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we only really need to transform as many as necessary to drop below the docValueFieldsLimit, and then only those that that supportsFromSource().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only really need to transform as many as necessary to drop below the docValueFieldsLimit

As long as we have to touch the source, we might as well fetch them all from there. Performance-wise it's going to be better.

and then only those that that supportsFromSource()

But this is a very good poing. In case we have fields that don't support from-source, we should insist taking them from doc_values. Then we'll also need a check to see if we're still over the limit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we are assuming that anything that is FromFields that has ExtractionMethod.DOC_VALUE also supports extraction via _source? This seems OK to me for the most part, except for GeoPointField, which may need to override supportsFromSource() to always return false.

I think we may want to do something similar with TimeField

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Will adjust.

@dimitris-athanasiou dimitris-athanasiou force-pushed the fetch-from-source-when-too-many-fields branch from 02c056e to a3f79e0 Compare June 14, 2019 11:38
@dimitris-athanasiou dimitris-athanasiou force-pushed the fetch-from-source-when-too-many-fields branch from a3f79e0 to 5c5819e Compare June 14, 2019 12:19
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏑 🌱
:shipit:

@dimitris-athanasiou dimitris-athanasiou merged commit eced353 into elastic:feature-ml-data-frame-analytics Jun 14, 2019
@dimitris-athanasiou dimitris-athanasiou deleted the fetch-from-source-when-too-many-fields branch June 14, 2019 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants