Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Sep 4, 2018

This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.

This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

SearchContext context,
Aggregator parent, List<PipelineAggregator> pipelineAggregators,
Map<String, Object> metaData) throws IOException {
Map<String, Object> metaData, CheckedFunction<LeafReader, Number, IOException> maxFunc) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call this something other than maxFunc as it confuses what its actually doing (getting the max for the segment) from the normal running of the agg. Maybe calling it segmentMaxFunction would be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

* @param fieldName The name of the field.
* @param converter The point value converter.
*/
public static CheckedFunction<LeafReader, Number, IOException> createShortcutMax(String fieldName, Function<byte[], Number> converter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird to me here. I wonder if we should instead have this method in the aggregator class and pass in a boolean parameter that determines whether we use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the converter so we could pass it to the aggregator directly and if it's not null we can try to extract the min/max value from points ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I like that better. I'm not a fan of the Aggregators calling back out to the factory so if we can help that not happen then I'm on board 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be pkg-private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in tests that are in a different package (metrics). We could probably merge all metrics.* in a single package instead of having one package per metrics agg ?

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions.

}
}
});
} catch (CollectionTerminatedException e) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we catch CollectionTerminatedException here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not needed, thanks


@Override
public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
if (FutureArrays.compareUnsigned(maxValue, 0, numBytes, maxPackedValue, 0, numBytes) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays#equals? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep ;)

public LeafBucketCollector getLeafCollector(LeafReaderContext ctx,
final LeafBucketCollector sub) throws IOException {
if (valuesSource == null) {
return LeafBucketCollector.NO_OP_COLLECTOR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we early terminate in this case too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we can (no parent agg), yes

double max = maxes.get(0);
max = Math.max(max, segMax.doubleValue());
maxes.set(0, max);
throw new CollectionTerminatedException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we know it's ok to throw this exception and that we won't terminate parent aggregators too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this is checked in the factory when we create the function which will return null if the aggregator has a parent.

* @param fieldName The name of the field.
* @param converter The point value converter.
*/
public static CheckedFunction<LeafReader, Number, IOException> createShortcutMax(String fieldName, Function<byte[], Number> converter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can it be pkg-private?

if (fieldType == null || fieldType.indexOptions() == IndexOptions.NONE) {
return null;
}
Function<byte[], Number> converter = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also verify that the script is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes good catch

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, just left some minor comments. Could we also have test for the case that a field has doc values but is not indexed, so that its min/max values can't be computed?


@Override
public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
if (FutureArrays.compareUnsigned(maxValue, 0, numBytes, maxPackedValue, 0, numBytes) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays#equals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a java9 method, at least the one that takes a start and end offset, and I am not sure we can use the java8 Arrays#equals(byte[], byte[]) since the min and max can have a length greater than numBytes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to FutureArrays#equals which is equivalent


@Override
public PointValues.Relation compare(byte[] minPackedValue, byte[] maxPackedValue) {
if (FutureArrays.compareUnsigned(maxPackedValue, 0, numBytes, maxValue, 0, numBytes) == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays#equals?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to FutureArrays#equals which is equivalent

if (liveDocs.get(docID)) {
result[0] = converter.apply(packedValue);
// this is the first leaf with a live doc so the value is the minimum for this segment.
throw new CollectionTerminatedException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use a similar mechanism as max and avoid throwing an exception here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the max we can only check leaves that contain the max value of the segment but for min we can start from the first leaf and stop whenever we reach a non-deleted document. If we apply the same logic than max to avoid the exception we'd be able to find the minimum value only on leaves that contain the real minimum value regardless of deletions. I am fine with changing the logic here but just wanted to outline the implications.

@jpountz
Copy link
Contributor

jpountz commented Sep 7, 2018

You can ignore my comment about testing, I just saw there was one already.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two minor comments but LGTM

@jimczi jimczi merged commit ee21067 into elastic:master Oct 3, 2018
@jimczi jimczi deleted the early_terminate_min_max branch October 3, 2018 16:33
jimczi added a commit that referenced this pull request Oct 3, 2018
This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.
kcm pushed a commit that referenced this pull request Oct 30, 2018
This commit adds the support to early terminate the collection of a leaf
in the min/max aggregator. If the query matches all documents the min and max value
for a numeric field can be retrieved efficiently in the points reader.
This change applies this optimization when possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants