Stop using round-tripped PipelineAggregators #53423

nik9000 · 2020-03-11T17:28:08Z

This begins to clean up how PipelineAggregators and executed.
Previously, we would create the PipelineAggregators on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the PipelineAggregators that were on the
first shard's results. This is inefficient because:

The data node needs to make the PipelineAggregator only to
serialize it and then throw it away.
The coordinating node needs to deserialize all of the
PipelineAggregators even though it only needs one of them.
You end up with many PipelineAggregator instances when you only
really need one per pipeline.
PipelineAggregator needs to implement serialization.

This begins to undo these by building the PipelineAggregators directly
on the coordinating node and using those instead of the
PipelineAggregators in the aggregtion tree. In a follow up change
we'll stop serializing the PipelineAggregators to node versions that
support this behavior. And, one day, we'll be able to remove
PipelineAggregator from the aggregation result tree entirely.

Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the AggregationBuilder
tree because that makes sense.

This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really *need* one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because *that* makes sense.

elasticmachine · 2020-03-11T17:28:10Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-03-11T17:42:22Z

Wow! The docs tests failed. Interesting that we didn't have any other thing covering this case. I'll dig.

nik9000 · 2020-03-11T20:28:03Z

Wow! The docs tests failed. Interesting that we didn't have any other thing covering this case. I'll dig.

That was surprisingly tricky to track down. And it leads to another thing that I'll need to do - shift validation from being based on Aggregators to based on AggregatorBuilders.

nik9000 · 2020-03-11T20:28:37Z

That was surprisingly tricky to track down. And it leads to another thing that I'll need to do - shift validation from being based on Aggregators to based on AggregatorBuilders.

In a follow up. Preserving the old "build the pipelines in the tree too" will allow me to work around this.

polyfractal · 2020-03-12T18:35:07Z

And it leads to another thing that I'll need to do - shift validation from being based on Aggregators to based on AggregatorBuilders.

Haven't had a chance to look at the PR yet (hopefully soon!) but yeah, I think this makes sense for a future PR. Aggs can't get away with this because they need to resolve fields, but pipelines only care about an agg being at the right location (and sometimes the right kind of agg) so the Builder tree should be sufficient 👍

nik9000 · 2020-03-12T21:33:40Z

server/src/main/java/org/elasticsearch/search/SearchService.java

+    /**
+     * Returns a builder for {@link InternalAggregation.ReduceContext}. This
+     * builder retains a reference to the provided {@link SearchRequest}.
+     */


@jimczi, I remember us trying not to hold on to references to the SearchRequest because it could be big. Or something like that. Is that still a thing? It looks like we keep the SearchRequest around for a while during the search right now.

That's totally ok since you refer to the original request which is unique per search. You also build the pipeline tree lazily which seems like a nice win to me.

jimczi

This is amazing :). I left minor comments but I think this will tremendously help the usage and extensibility of pipeline aggregators. They should be used and known by coordinating nodes only and this pr is a giant step in this direction.

jimczi · 2020-03-12T21:47:00Z

server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java

+        InternalAggregation.ReduceContextBuilder aggReduceContextBuilder = new InternalAggregation.ReduceContextBuilder() {
+            @Override
+            public ReduceContext forPartialReduction() {
+                throw new UnsupportedOperationException("Scroll requests don't have aggs");


jimczi · 2020-03-12T21:57:01Z

server/src/main/java/org/elasticsearch/search/SearchService.java

+    /**
+     * Returns a builder for {@link InternalAggregation.ReduceContext}. This
+     * builder retains a reference to the provided {@link SearchRequest}.
+     */


That's totally ok since you refer to the original request which is unique per search. You also build the pipeline tree lazily which seems like a nice win to me.

jimczi · 2020-03-12T22:00:16Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/RollupResponseTranslator.java

-                    new InternalAggregation.ReduceContext(reduceContext.bigArrays(), reduceContext.scriptService(), true));
+            // TODO it looks like this passes the "final" reduce context more than once.
+            // Once here and once in the for above. That is bound to cause trouble.
+            currentTree = InternalAggregations.reduce(Arrays.asList(currentTree, liveAggs), finalReduceContext);


Good catch, let's open an issue since it seems easy to fix rather than a TODO ?

👍 indeed. Rollup doesn't work with pipelines anyhow (mostly due to the serialization issue, with different aggs being sent to rollup vs live indices, it messes up how pipelines operate).... but I could see multiple final reductions potentially hurting accuracy on certain aggs that care like terms

polyfractal

I like this a lot :) Definitely a good step towards untangling pipelines!

polyfractal · 2020-03-13T17:32:22Z

server/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java

                return EMPTY;
            }
            List<PipelineAggregationBuilder> orderedpipelineAggregators = null;
            if (skipResolveOrder) {


While we're here, is it possible to nuke skipResolveOrder too? I believe it's only used by BasePipelineAggregationTestCase, and that doesn't even invoke build() so this is basically "dead" testing code. I think.

Not a problem to leave if there's a complication... I just particularly dislike this little tidbit and wouldn't mind seeing it go if we're already touching this :)

polyfractal · 2020-03-13T17:42:00Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/RollupResponseTranslator.java

-                    new InternalAggregation.ReduceContext(reduceContext.bigArrays(), reduceContext.scriptService(), true));
+            // TODO it looks like this passes the "final" reduce context more than once.
+            // Once here and once in the for above. That is bound to cause trouble.
+            currentTree = InternalAggregations.reduce(Arrays.asList(currentTree, liveAggs), finalReduceContext);


👍 indeed. Rollup doesn't work with pipelines anyhow (mostly due to the serialization issue, with different aggs being sent to rollup vs live indices, it messes up how pipelines operate).... but I could see multiple final reductions potentially hurting accuracy on certain aggs that care like terms

This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really *need* one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because *that* makes sense.

…53629) This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really *need* one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because *that* makes sense.

nik9000 added :Analytics/Aggregations Aggregations das awesome v8.0.0 v7.7.0 labels Mar 11, 2020

nik9000 requested a review from polyfractal March 11, 2020 17:28

nik9000 added the >refactoring label Mar 11, 2020

Merge branch 'master' into no_round_pipelines

229c479

Javadoc

63045b8

nik9000 added 2 commits March 11, 2020 16:42

WIP

5b8ba78

Merge branch 'master' into no_round_pipelines

04fac9b

nik9000 added 2 commits March 12, 2020 15:49

Sneaky sneaky

8f42977

Merge branch 'master' into no_round_pipelines

abf6fc1

nik9000 commented Mar 12, 2020

View reviewed changes

jimczi reviewed Mar 12, 2020

View reviewed changes

$polyfractal$

polyfractal approved these changes Mar 13, 2020

View reviewed changes

nik9000 added 2 commits March 16, 2020 13:25

Merge branch 'master' into no_round_pipelines

9e50256

Drop skip resolve order

fed6af6

nik9000 merged commit 4d81edb into elastic:master Mar 16, 2020

nik9000 added the backport pending label Mar 16, 2020

nik9000 removed the backport pending label Mar 16, 2020

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Stop using round-tripped PipelineAggregators #53423

Stop using round-tripped PipelineAggregators #53423

Uh oh!

Conversation

nik9000 commented Mar 11, 2020

Uh oh!

elasticmachine commented Mar 11, 2020

Uh oh!

nik9000 commented Mar 11, 2020

Uh oh!

nik9000 commented Mar 11, 2020

Uh oh!

nik9000 commented Mar 11, 2020

Uh oh!

polyfractal commented Mar 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jimczi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

polyfractal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

$@polyfractal$ polyfractal left a comment