-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Introduce limit to the number of terms in Terms Query #27968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce limit to the number of terms in Terms Query #27968
Conversation
- Introduce index level settings to control the maximum number of terms
that can be used in a Terms Query
- Throw an error if a request exceeds this max number
Closes elastic#18829
| */ | ||
| public static final Setting<Integer> MAX_TERMS_COUNT_SETTING = | ||
| Setting.intSetting("index.max_terms_count", 2000, 1, Property.Dynamic, Property.IndexScope); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I intended to set the limit to 1024 the same number as the Lucene maximum number of boolean clauses, but some tests in xpack use higher number of terms, thus I had to increase this number to 2000.
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good but I think we need to be careful with the default value: I suspect many users will hit it with a value of 2000. I'd be tempted to set the default value to MAX_VALUE in 6.x and maybe something higher in master, eg. 2^16. For the record, the limit on boolean queries is due to the fact that composing X term queries with a boolean query requires to stream X postings lists from disk concurrently. On the other hand, the terms query reads all postings lists sequentially, which is why it scales better with the number of terms (but this comes with a drawback: all matches need to be visited).
|
@jpountz Thanks for your comments, Adrien! Thanks for the explanation how boolean query and terms query work. |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Agreed. |
- Correct a bug in the referenced settings Closes #18829
- Introduce index level settings to control the maximum number of terms
that can be used in a Terms Query
- Issue a deprecation warning if a request exceeds this max number
Closes #18829
* es/6.x: (48 commits) Bump compat version for local depdendent test to 6.2.0 Pass `java.locale.providers=COMPAT` to Java 9 onwards (#28080) Allow shrinking of indices from a previous major (#28076) Add Writeable.Reader support to TransportResponseHandler (#28010) Fix cluster.routing.allocation.enable and cluster.routing.rebalance.enable casing (#28037) [Test] Fix scores for dcg in RankEvalRequestIT and RankEvalYamlIT [Docs] Add note on limitation for significant_text with nested objects (#28052) [Test] Fix allowed delta for calculated scores in DiscountedCumulativeGainTests Enable convert processor to support Long and Double. (#27957) Enable Wildfly tests on JDK 9 and JDK 10 update ingest-attachment to use Tika 1.17 and newer deps (#27824) Only bind loopback addresses when binding to local Fix assertion in Wildfly build Fix typo in comment in Wildfly build Use ephemeral ports in Wildfly tests Update fuzzy-query.asciidoc (#28032) Add node id to shard failure message (#28024) Introduce limit to the number of terms in Terms Query (#27968) Upgrade Gradle Shadow plugin to 2.0.2 Upgrade to JMH 1.19 ...
|
Will this work transparently for clients? e.g., if Kibana issues a request for a terms-agg exceeding the threshold, will ES automatically truncate the result, or will ES raise an error? |
|
@thomasneirynck it will throw an |
|
@thomasneirynck for 6.2 we don't plan to throw an error unless the setting is explicitly set by a user. So in 6.2, by default is the limit is exceeded (and the limit is 65536), only deprecation warning will be issued. Will this create a problem in Kibana? |
|
thx @mayya-sharipova Kibana should be OK then for 6.2. When the limit is set on the ES-side, it makes sense for admins then to make the corresponding change in Kibana. We can expand the Kibana documentation as well. Do you know when you will be turning off the deprecation behavior and have ES start failing? |
|
This will happen in 7.0. However I'm wondering that there might be some confusion: you mentioned |
|
@jpountz you're right. my mistake, I was talking about terms-agg, not terms-query. Sorry for spam. Yeah, it will just propagate. |
that can be used in a Terms Query
Closes #18829