Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented Mar 16, 2020

This begins to clean up how PipelineAggregators and executed.
Previously, we would create the PipelineAggregators on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the PipelineAggregators that were on the
first shard's results. This is inefficient because:

  1. The data node needs to make the PipelineAggregator only to
    serialize it and then throw it away.
  2. The coordinating node needs to deserialize all of the
    PipelineAggregators even though it only needs one of them.
  3. You end up with many PipelineAggregator instances when you only
    really need one per pipeline.
  4. PipelineAggregator needs to implement serialization.

This begins to undo these by building the PipelineAggregators directly
on the coordinating node and using those instead of the
PipelineAggregators in the aggregtion tree. In a follow up change
we'll stop serializing the PipelineAggregators to node versions that
support this behavior. And, one day, we'll be able to remove
PipelineAggregator from the aggregation result tree entirely.

Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the AggregationBuilder
tree because that makes sense.

nik9000 added 2 commits March 16, 2020 15:01
This begins to clean up how `PipelineAggregator`s and executed.
Previously, we would create the `PipelineAggregator`s on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the `PipelineAggregator`s that were on the
first shard's results. This is inefficient because:
1. The data node needs to make the `PipelineAggregator` only to
   serialize it and then throw it away.
2. The coordinating node needs to deserialize all of the
   `PipelineAggregator`s even though it only needs one of them.
3. You end up with many `PipelineAggregator` instances when you only
   really *need* one per pipeline.
4. `PipelineAggregator` needs to implement serialization.

This begins to undo these by building the `PipelineAggregator`s directly
on the coordinating node and using those instead of the
`PipelineAggregator`s in the aggregtion tree. In a follow up change
we'll stop serializing the `PipelineAggregator`s to node versions that
support this behavior. And, one day, we'll be able to remove
`PipelineAggregator` from the aggregation result tree entirely.

Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the `AggregationBuilder`
tree because *that* makes sense.
@nik9000 nik9000 merged commit f0beab4 into elastic:7.x Mar 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant