-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Histogram aggs: add empty buckets only in the final reduce step #35921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
javanna
merged 2 commits into
elastic:master
from
javanna:enhancement/histo_empty_buckets_final_reduce
Nov 30, 2018
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -119,19 +119,19 @@ | |
| import org.elasticsearch.search.aggregations.metrics.SumAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.metrics.TopHitsAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.metrics.ValueCountAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.InternalSimpleValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedSimpleValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator; | ||
| import org.elasticsearch.search.aggregations.pipeline.DerivativePipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.ExtendedStatsBucketPipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.InternalBucketMetricValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.InternalSimpleValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedBucketMetricValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedDerivative; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedExtendedStatsBucket; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedPercentilesBucket; | ||
| import org.elasticsearch.search.aggregations.pipeline.PercentilesBucketPipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedSimpleValue; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedStatsBucket; | ||
| import org.elasticsearch.search.aggregations.pipeline.PercentilesBucketPipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.PipelineAggregator; | ||
| import org.elasticsearch.search.aggregations.pipeline.StatsBucketPipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.ExtendedStatsBucketPipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedExtendedStatsBucket; | ||
| import org.elasticsearch.search.aggregations.pipeline.DerivativePipelineAggregationBuilder; | ||
| import org.elasticsearch.search.aggregations.pipeline.ParsedDerivative; | ||
|
|
||
| import java.io.IOException; | ||
| import java.util.ArrayList; | ||
|
|
@@ -151,6 +151,7 @@ | |
| import static org.elasticsearch.test.XContentTestUtils.insertRandomFields; | ||
| import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertToXContentEquivalent; | ||
| import static org.hamcrest.Matchers.equalTo; | ||
| import static org.hamcrest.Matchers.lessThanOrEqualTo; | ||
|
|
||
| public abstract class InternalAggregationTestCase<T extends InternalAggregation> extends AbstractWireSerializingTestCase<T> { | ||
| public static final int DEFAULT_MAX_BUCKETS = 100000; | ||
|
|
@@ -267,7 +268,14 @@ public void testReduceRandom() { | |
| new InternalAggregation.ReduceContext(bigArrays, mockScriptService, bucketConsumer,false); | ||
| @SuppressWarnings("unchecked") | ||
| T reduced = (T) inputs.get(0).reduce(internalAggregations, context); | ||
| assertMultiBucketConsumer(reduced, bucketConsumer); | ||
| int initialBucketCount = 0; | ||
| for (InternalAggregation internalAggregation : internalAggregations) { | ||
| initialBucketCount += countInnerBucket(internalAggregation); | ||
| } | ||
| int reducedBucketCount = countInnerBucket(reduced); | ||
| //check that non final reduction never adds buckets | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||
| assertThat(reducedBucketCount, lessThanOrEqualTo(initialBucketCount)); | ||
| assertMultiBucketConsumer(reducedBucketCount, bucketConsumer); | ||
| toReduce = new ArrayList<>(toReduce.subList(r, toReduceSize)); | ||
| toReduce.add(reduced); | ||
| } | ||
|
|
@@ -332,14 +340,14 @@ protected NamedXContentRegistry xContentRegistry() { | |
|
|
||
| public final void testFromXContent() throws IOException { | ||
| final T aggregation = createTestInstance(); | ||
| final Aggregation parsedAggregation = parseAndAssert(aggregation, randomBoolean(), false); | ||
| assertFromXContent(aggregation, (ParsedAggregation) parsedAggregation); | ||
| final ParsedAggregation parsedAggregation = parseAndAssert(aggregation, randomBoolean(), false); | ||
| assertFromXContent(aggregation, parsedAggregation); | ||
| } | ||
|
|
||
| public final void testFromXContentWithRandomFields() throws IOException { | ||
| final T aggregation = createTestInstance(); | ||
| final Aggregation parsedAggregation = parseAndAssert(aggregation, randomBoolean(), true); | ||
| assertFromXContent(aggregation, (ParsedAggregation) parsedAggregation); | ||
| final ParsedAggregation parsedAggregation = parseAndAssert(aggregation, randomBoolean(), true); | ||
| assertFromXContent(aggregation, parsedAggregation); | ||
| } | ||
|
|
||
| protected abstract void assertFromXContent(T aggregation, ParsedAggregation parsedAggregation) throws IOException; | ||
|
|
@@ -423,6 +431,10 @@ protected static DocValueFormat randomNumericDocValueFormat() { | |
| } | ||
|
|
||
| public static void assertMultiBucketConsumer(Aggregation agg, MultiBucketConsumer bucketConsumer) { | ||
| assertThat(bucketConsumer.getCount(), equalTo(countInnerBucket(agg))); | ||
| assertMultiBucketConsumer(countInnerBucket(agg), bucketConsumer); | ||
| } | ||
|
|
||
| private static void assertMultiBucketConsumer(int innerBucketCount, MultiBucketConsumer bucketConsumer) { | ||
| assertThat(bucketConsumer.getCount(), equalTo(innerBucketCount)); | ||
| } | ||
| } | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a few minutes to convince myself that this works. :)
On first glance the behavior changes because sorting is only done on the final reduction now (instead of every reduction), which I thought might break
reduceBuckets()as that relies on consistent ordering. But histos/date_histos sort their shard results bykey:asc, so this isn't actually a problem.Could we add a test to
DateHistogramAggregatorTests/HistogramAggregatorTeststhat randomly chooses a sort order +min_doc_count: 0to ensure this internal contract never changes in the future? It looks like none of those tests set an order so we might not notice otherwise.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I changed anything around sorting. That if was hard to read, I rewrote it but the only actual change is that empty buckets are only added in the final reduction phase. I am working on tests, I will add what you suggest, I had other additions as well in mind but I hit some roadblocks (see #36004). Basically empty buckets were not tested in our unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We chatted about this in slack, and I'm just bad at boolean logic. All good here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check to the base class which verifies that no buckets are added in non final reduce phases. Now that we test adding empty buckets for histogram and date histogram aggs, this check makes some sense I think. These are the only aggs that add buckets as part of reduce right? I wonder if I need to check whether there is some other missing test coverage somewhere.
Also, I worked a bit on increasing test coverage in DateHistogramAggregatorTests like you suggested, and I think I prefer doing it in a follow-up if still necessary.