Skip to content

Conversation

@mayya-sharipova
Copy link
Contributor

  • Introduce index level settings to control the maximum number of terms
    that can be used in a Terms Query
  • Throw an error if a request exceeds this max number

Closes #18829

- Introduce index level settings to control the maximum number of terms
    that can be used in a Terms Query
- Throw an error if a request exceeds this max number

Closes elastic#18829
*/
public static final Setting<Integer> MAX_TERMS_COUNT_SETTING =
Setting.intSetting("index.max_terms_count", 2000, 1, Property.Dynamic, Property.IndexScope);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I intended to set the limit to 1024 the same number as the Lucene maximum number of boolean clauses, but some tests in xpack use higher number of terms, thus I had to increase this number to 2000.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good but I think we need to be careful with the default value: I suspect many users will hit it with a value of 2000. I'd be tempted to set the default value to MAX_VALUE in 6.x and maybe something higher in master, eg. 2^16. For the record, the limit on boolean queries is due to the fact that composing X term queries with a boolean query requires to stream X postings lists from disk concurrently. On the other hand, the terms query reads all postings lists sequentially, which is why it scales better with the number of terms (but this comes with a drawback: all matches need to be visited).

@mayya-sharipova
Copy link
Contributor Author

@jpountz Thanks for your comments, Adrien!
It makes sense! I will increase limit in master to 2^16.
For the 6.x, we were also thinking to issue a deprecation warning instead of an error if the max_value is exceeded, so may be it makes sense to have the same limit as in master: 2^16?

Thanks for the explanation how boolean query and terms query work.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jpountz
Copy link
Contributor

jpountz commented Dec 28, 2017

For the 6.x, we were also thinking to issue a deprecation warning instead of an error if the max_value is exceeded, so may be it makes sense to have the same limit as in master: 2^16?

Agreed.

@mayya-sharipova mayya-sharipova merged commit dcde895 into elastic:master Dec 28, 2017
@mayya-sharipova mayya-sharipova deleted the limit-terms-count branch December 28, 2017 22:36
@mayya-sharipova mayya-sharipova added v6.2.0 v7.0.0 :Search/Search Search-related issues that do not fall into other categories >breaking labels Dec 29, 2017
mayya-sharipova added a commit that referenced this pull request Dec 29, 2017
- Correct a bug in the referenced settings

Closes #18829
mayya-sharipova added a commit that referenced this pull request Dec 29, 2017
- Introduce index level settings to control the maximum number of terms
    that can be used in a Terms Query
- Issue a deprecation warning if a request exceeds this max number

Closes #18829
martijnvg added a commit that referenced this pull request Jan 4, 2018
* es/6.x: (48 commits)
  Bump compat version for local depdendent test to 6.2.0
  Pass `java.locale.providers=COMPAT` to Java 9 onwards (#28080)
  Allow shrinking of indices from a previous major (#28076)
  Add Writeable.Reader support to TransportResponseHandler (#28010)
  Fix cluster.routing.allocation.enable and cluster.routing.rebalance.enable casing (#28037)
  [Test] Fix scores for dcg in RankEvalRequestIT and RankEvalYamlIT
  [Docs] Add note on limitation for significant_text with nested objects (#28052)
  [Test] Fix allowed delta for calculated scores in DiscountedCumulativeGainTests
  Enable convert processor to support Long and Double. (#27957)
  Enable Wildfly tests on JDK 9 and JDK 10
  update ingest-attachment to use Tika 1.17 and newer deps (#27824)
  Only bind loopback addresses when binding to local
  Fix assertion in Wildfly build
  Fix typo in comment in Wildfly build
  Use ephemeral ports in Wildfly tests
  Update fuzzy-query.asciidoc (#28032)
  Add node id to shard failure message (#28024)
  Introduce limit to the number of terms in Terms Query (#27968)
  Upgrade Gradle Shadow plugin to 2.0.2
  Upgrade to JMH 1.19
  ...
@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jan 9, 2018

Will this work transparently for clients? e.g., if Kibana issues a request for a terms-agg exceeding the threshold, will ES automatically truncate the result, or will ES raise an error?

@s1monw
Copy link
Contributor

s1monw commented Jan 9, 2018

@thomasneirynck it will throw an IllegalArgumentException

@mayya-sharipova
Copy link
Contributor Author

@thomasneirynck for 6.2 we don't plan to throw an error unless the setting is explicitly set by a user. So in 6.2, by default is the limit is exceeded (and the limit is 65536), only deprecation warning will be issued. Will this create a problem in Kibana?

@thomasneirynck
Copy link
Contributor

thomasneirynck commented Jan 10, 2018

thx @mayya-sharipova Kibana should be OK then for 6.2.

When the limit is set on the ES-side, it makes sense for admins then to make the corresponding change in Kibana.

We can expand the Kibana documentation as well.

Do you know when you will be turning off the deprecation behavior and have ES start failing?

@jpountz
Copy link
Contributor

jpountz commented Jan 11, 2018

This will happen in 7.0. However I'm wondering that there might be some confusion: you mentioned terms-agg in your previous comment while this change is about the terms query, and I don't think Kibana should worry about it, it should just propagate the error returned by Elasticsearch to the user?

@thomasneirynck
Copy link
Contributor

@jpountz you're right. my mistake, I was talking about terms-agg, not terms-query. Sorry for spam. Yeah, it will just propagate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>breaking :Search/Search Search-related issues that do not fall into other categories v6.2.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants