-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make global ords terms simpler to understand #57241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make global ords terms simpler to understand #57241
Conversation
When the `terms` enum operates on non-numeric data it can collect it via global ordinals. It actually has two separate collection strategies for, one "dense" and one "remapping". Each of *those* strategies has two "iteration" strategies that it uses to build buckets, depending on whether or not we need buckets with `0` docs in them. Previously this was done with several `null` checks and never really explained. This change replaces those checks with two `CollectionStrategy` classes which have good stuff like documentation.
|
Ah! I should add a |
|
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations) |
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, definitely more clear than what we had before.
| */ | ||
| abstract void globalOrdsReady(SortedSetDocValues globalOrds); | ||
| /** | ||
| * Collect a global ordinal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this is what gets called by the collector, but I think it'd help to make this javadoc a little more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| boolean remapGlobalOrds() { | ||
| return bucketOrds != null; | ||
| this.acceptedGlobalOrdinals = includeExclude == null ? l -> true : includeExclude.acceptedGlobalOrdinals(values)::get; | ||
| this.collectionStrategy = remapGlobalOrds ? new RemapGlobalOrds() : new DenseGlobalOrds(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to just pass the collection strategy in directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm - I did think about it but we'd have to do that same "pass a builder" kind of thing because it is a non-static inner class.
| } | ||
|
|
||
| /** | ||
| * Strategy for collecting global ordinals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your comment from the PR description that we have two collection strategies and each has two iteration strategies would fit well in the javadoc for this class. Or at the very least, a note on forEach that it should account for both iteration cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
It took quite a bit of staring for me to figure out what we had before! |
|
Thanks @not-napoleon! I've added some javadoc and merged. If you can think of a way to pass the strategy in that you'd prefer please ping me! |
When the `terms` enum operates on non-numeric data it can collect it via global ordinals. It actually has two separate collection strategies for, one "dense" and one "remapping". Each of *those* strategies has two "iteration" strategies that it uses to build buckets, depending on whether or not we need buckets with `0` docs in them. Previously this was done with several `null` checks and never really explained. This change replaces those checks with two `CollectionStrategy` classes which have good stuff like documentation.
I accidentally didn't put the customary "skip the last version" on #57241 and the PR tests didn't catch it. This adds it.
…7311) When the `terms` enum operates on non-numeric data it can collect it via global ordinals. It actually has two separate collection strategies for, one "dense" and one "remapping". Each of *those* strategies has two "iteration" strategies that it uses to build buckets, depending on whether or not we need buckets with `0` docs in them. Previously this was done with several `null` checks and never really explained. This change replaces those checks with two `CollectionStrategy` classes which have good stuff like documentation.
When the `terms` agg runs against strings and uses global ordinals it has an optimization when it collects segments that only ever have a single value for the particular string. This is *very* common. But I broke it in elastic#57241. This fixes that optimization and adds `debug` information that you can use to see how often we collect segments of each type. And adds a test to make sure that I don't break the optimization again. We also had a specialiation for when there isn't a filter on the terms to aggregate. I had removed that specialization in elastic#57241 which resulted in some slow down as well. This adds it back but in a more clear way. And, hopefully, a way that is marginally faster when there *is* a filter. Closes elastic#57407
When the `terms` agg runs against strings and uses global ordinals it has an optimization when it collects segments that only ever have a single value for the particular string. This is *very* common. But I broke it in #57241. This fixes that optimization and adds `debug` information that you can use to see how often we collect segments of each type. And adds a test to make sure that I don't break the optimization again. We also had a specialiation for when there isn't a filter on the terms to aggregate. I had removed that specialization in #57241 which resulted in some slow down as well. This adds it back but in a more clear way. And, hopefully, a way that is marginally faster when there *is* a filter. Closes #57407
When the `terms` agg runs against strings and uses global ordinals it has an optimization when it collects segments that only ever have a single value for the particular string. This is *very* common. But I broke it in elastic#57241. This fixes that optimization and adds `debug` information that you can use to see how often we collect segments of each type. And adds a test to make sure that I don't break the optimization again. We also had a specialiation for when there isn't a filter on the terms to aggregate. I had removed that specialization in elastic#57241 which resulted in some slow down as well. This adds it back but in a more clear way. And, hopefully, a way that is marginally faster when there *is* a filter. Closes elastic#57407
When the `terms` agg runs against strings and uses global ordinals it has an optimization when it collects segments that only ever have a single value for the particular string. This is *very* common. But I broke it in #57241. This fixes that optimization and adds `debug` information that you can use to see how often we collect segments of each type. And adds a test to make sure that I don't break the optimization again. We also had a specialiation for when there isn't a filter on the terms to aggregate. I had removed that specialization in #57241 which resulted in some slow down as well. This adds it back but in a more clear way. And, hopefully, a way that is marginally faster when there *is* a filter. Closes #57407
When the
termsenum operates on non-numeric data it can collect it viaglobal ordinals. It actually has two separate collection strategies for this,
one "dense" and one "remapping". Each of those strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with
0docs in them. Previously thiswas done with several
nullchecks and never really explained. Thischange replaces those checks with two
CollectionStrategyclasses whichhave good stuff like Javadocs.
It also adds the name of the strategy used to the debugging information.
Related to #56487