-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add early termination support for min/max aggregations #33375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit adds the support to early terminate the collection of a leaf in the min/max aggregator. If the query matches all documents the min and max value for a numeric field can be retrieved efficiently in the points reader. This change applies this optimization when possible.
|
Pinging @elastic/es-search-aggs |
| SearchContext context, | ||
| Aggregator parent, List<PipelineAggregator> pipelineAggregators, | ||
| Map<String, Object> metaData) throws IOException { | ||
| Map<String, Object> metaData, CheckedFunction<LeafReader, Number, IOException> maxFunc) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call this something other than maxFunc as it confuses what its actually doing (getting the max for the segment) from the normal running of the agg. Maybe calling it segmentMaxFunction would be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
| * @param fieldName The name of the field. | ||
| * @param converter The point value converter. | ||
| */ | ||
| public static CheckedFunction<LeafReader, Number, IOException> createShortcutMax(String fieldName, Function<byte[], Number> converter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird to me here. I wonder if we should instead have this method in the aggregator class and pass in a boolean parameter that determines whether we use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the converter so we could pass it to the aggregator directly and if it's not null we can try to extract the min/max value from points ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I like that better. I'm not a fan of the Aggregators calling back out to the factory so if we can help that not happen then I'm on board 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be pkg-private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in tests that are in a different package (metrics). We could probably merge all metrics.* in a single package instead of having one package per metrics agg ?
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some questions.
| } | ||
| } | ||
| }); | ||
| } catch (CollectionTerminatedException e) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we catch CollectionTerminatedException here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not needed, thanks
|
|
||
| @Override | ||
| public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { | ||
| if (FutureArrays.compareUnsigned(maxValue, 0, numBytes, maxPackedValue, 0, numBytes) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays#equals? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep ;)
| public LeafBucketCollector getLeafCollector(LeafReaderContext ctx, | ||
| final LeafBucketCollector sub) throws IOException { | ||
| if (valuesSource == null) { | ||
| return LeafBucketCollector.NO_OP_COLLECTOR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we early terminate in this case too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we can (no parent agg), yes
| double max = maxes.get(0); | ||
| max = Math.max(max, segMax.doubleValue()); | ||
| maxes.set(0, max); | ||
| throw new CollectionTerminatedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we know it's ok to throw this exception and that we won't terminate parent aggregators too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently this is checked in the factory when we create the function which will return null if the aggregator has a parent.
| * @param fieldName The name of the field. | ||
| * @param converter The point value converter. | ||
| */ | ||
| public static CheckedFunction<LeafReader, Number, IOException> createShortcutMax(String fieldName, Function<byte[], Number> converter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be pkg-private?
| if (fieldType == null || fieldType.indexOptions() == IndexOptions.NONE) { | ||
| return null; | ||
| } | ||
| Function<byte[], Number> converter = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also verify that the script is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good catch
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, just left some minor comments. Could we also have test for the case that a field has doc values but is not indexed, so that its min/max values can't be computed?
|
|
||
| @Override | ||
| public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { | ||
| if (FutureArrays.compareUnsigned(maxValue, 0, numBytes, maxPackedValue, 0, numBytes) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays#equals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a java9 method, at least the one that takes a start and end offset, and I am not sure we can use the java8 Arrays#equals(byte[], byte[]) since the min and max can have a length greater than numBytes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to FutureArrays#equals which is equivalent
|
|
||
| @Override | ||
| public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) { | ||
| if (FutureArrays.compareUnsigned(maxPackedValue, 0, numBytes, maxValue, 0, numBytes) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays#equals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to FutureArrays#equals which is equivalent
| if (liveDocs.get(docID)) { | ||
| result[0] = converter.apply(packedValue); | ||
| // this is the first leaf with a live doc so the value is the minimum for this segment. | ||
| throw new CollectionTerminatedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use a similar mechanism as max and avoid throwing an exception here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the max we can only check leaves that contain the max value of the segment but for min we can start from the first leaf and stop whenever we reach a non-deleted document. If we apply the same logic than max to avoid the exception we'd be able to find the minimum value only on leaves that contain the real minimum value regardless of deletions. I am fine with changing the logic here but just wanted to outline the implications.
|
You can ignore my comment about testing, I just saw there was one already. |
colings86
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two minor comments but LGTM
server/src/main/java/org/elasticsearch/search/aggregations/metrics/MaxAggregator.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/aggregations/metrics/MinAggregator.java
Show resolved
Hide resolved
This commit adds the support to early terminate the collection of a leaf in the min/max aggregator. If the query matches all documents the min and max value for a numeric field can be retrieved efficiently in the points reader. This change applies this optimization when possible.
This commit adds the support to early terminate the collection of a leaf in the min/max aggregator. If the query matches all documents the min and max value for a numeric field can be retrieved efficiently in the points reader. This change applies this optimization when possible.
This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.