Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented May 28, 2020

This builds an auto_date_histogram aggregator that natively aggregates
from many buckets and uses it when the auto_date_histogram used to use
asMultiBucketAggregator which should save a significant amount of
memory in those cases. In particular, this happens when
auto_date_histogram is a sub-aggregator of a multi-bucketing aggregator
like terms or histogram or filters. For the most part we preserve
the original implementation when auto_date_histogram only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be terrible when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487

EDIT: This used to say:

This rebuilds auto_date_histogram's aggregator to function without
asMultiBucketAggregator which should save a significant amount of
memory when it is not the top most aggregator.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

This change replaces the constant rewrites with a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be terrible when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

After all that, it amounts to about the same performance, in the
benchmarks that I've run. But the memory savings is totaly still at
thing!

Relates to #56487

This rebuilds `auto_date_histogram`'s aggregator to function without
`asMultiBucketAggregator` which should save a significant amount of
memory when it is not the top most aggregator.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

This change replaces the constant rewrites with a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an esimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

After all that, it amounts to about the same performance, in the
benchmarks that I've run. But the memory savings is totaly still at
thing!

Relates to elastic#56487
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 28, 2020
@talevy
Copy link
Contributor

talevy commented May 29, 2020

Hey Nik, do you mind sharing some of the results from the benchmarks here for added context?

Copy link
Contributor

@talevy talevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable to me. left some comments

* <p>
* Initially it uses the most fine grained rounding configuration possible but
* as more data arrives it uses two heuristics to shift to coarser and coarser
* rounding. The first heuristic is the number of buckets, specifically, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the ending "it"?

* <strong>that</strong> as the initial value for the bucket count that it
* increments in order to trigger another promotion to another coarser
* rounding. This works fairly well at containing the number of buckets, but
* it the estimate of the number of buckets will be wrong if the buckets are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the leading "it"?

* <p>
* The second heuristic it uses to trigger promotion to a coarser rounding is
* the distance between the min and max bucket. When that distance is greater
* than what the current rounding supports it promotes. This is heuristic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop the "is"?

private IntArray liveBucketCountUnderestimate;
/**
* An over estimate of the number of wasted buckets. When this gets
* too high we {@link #rebucket()} which sets it to 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be AutoDateHistogramAggregator#rebucket? Intellij no-likey

newEstimatedBucketCount = (int) Math.ceil(oldEstimatedBucketCount * ratio);
} while (newRounding < roundingInfos.length - 1 && (
newEstimatedBucketCount > targetBuckets * roundingInfos[newRounding].getMaximumInnerInterval()
|| max - min > targetBuckets * roundingInfos[newRounding].getMaximumRoughEstimateDurationMillis()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be unit tests for when:

(max - min > targetBuckets * roundingInfos[newRounding].getMaximumRoughEstimateDurationMillis()) == false
&& (newEstimatedBucketCount > targetBuckets * roundingInfos[newRounding].getMaximumInnerInterval()))

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do cover this case in the test case, but there isn't a unit test for this exactly. do you think it is worth pulling this out somehow?

@nik9000
Copy link
Member Author

nik9000 commented May 29, 2020

Hey Nik, do you mind sharing some of the results from the benchmarks here for added context?

When the auto_date_histogram is at the top level I got before and after which shows a little drop in performance:

| before | 90th percentile service time |                  auto_date_histogram | 8599.79 | ms |
|  after | 90th percentile service time |                  auto_date_histogram | 9285.28 | ms |
| before | 90th percentile service time |          auto_date_histogram_with_tz |   13436 | ms |
|  after | 90th percentile service time |          auto_date_histogram_with_tz | 16364.3 | ms |
| before | 90th percentile service time |     auto_date_histogram_with_metrics | 45658.1 | ms |
|  after | 90th percentile service time |     auto_date_histogram_with_metrics | 44492.2 | ms |

Some of that seems a bit fuzzy, but I think the performance hit is real, I think.

When the agg is below a fairly low cardinality range aggregation you get before and after which look like:

| before | 90th percentile service time |                range_auto_date_histo | 2717.14 | ms |
|  after | 90th percentile service time |                range_auto_date_histo | 2357.22 | ms |
| before | 90th percentile service time |   range_auto_date_histo_with_metrics | 5593.12 | ms |
|  after | 90th percentile service time |   range_auto_date_histo_with_metrics | 5372.19 | ms |
| before | 90th percentile service time | range_auto_date_histo_with_time_zone | 3082.31 | ms |
|  after | 90th percentile service time | range_auto_date_histo_with_time_zone | 3040.94 | ms |

These amount to pretty much the same performance, though I have a suspicion that if I had something with a higher cardinality above the agg you'd see more benefit. The thing is, our high cardinality aggs are almost all date-based. The terms and range and the like are usually lower cardinality. Like, it is a little faster, but not a ton.

Having written it all down and formatted it well I'm less and less comfortable with the top level performance difference. Those tests just take so long to run! They use the NYC taxis data set which takes about five hours to run those tests. The non-top level tests use the NOAA data set which is comparatively smaller and faster to work with.

The memory savings when the agg isn't at the top is super real, but I wonder if I can claw back some of that top level performance. The time_zone case is already going to be much faster because of #56384. But the non-time_zone case is totally a thing.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions and a few comments :)

}
bucketOrds = newBucketOrds;
}
long min = Math.min(bounds.get(owningBucketOrd * 2), newKey);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

owningBucketOrd * 2 and owningBucketOrd * 2 + 1 are used a lot here and make it a bit difficult to read... could we just set some lowerBound/upperBound (or whatever name makes sense) variables to make it easier to read?

long oldSize = bounds.size();
bounds = context.bigArrays().grow(bounds, owningBucketOrd * 2 + 2);
for (long b = oldSize; b < bounds.size(); b++) {
bounds.set(b, (b & 1L) == 0L ? Long.MAX_VALUE : Long.MIN_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure I understand what this is doing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the min and max properly. It'd all go away if I make too arrays like you asked above. While I'll do.

bucketOrds = newBucketOrds;
}
long min = Math.min(bounds.get(owningBucketOrd * 2), newKey);
bounds.set(owningBucketOrd * 2, min);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move these two bounds.set() inside the if() { ... } down below, to avoid unnecessary setting when we have to fall through and increase the rounding. Not that it's terribly expensive, but if we can avoid always good :)

rebucket();
// Bump the threshold for the next rebucketing
wastedBucketsOverestimate = 0;
nextRebucketAt *= 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind doubling the rebucketing threshold? Shouldn't it stay constant since the goal of the agg is to produce a relatively constant number of buckets? As we rebucket and move to coarser intervals, the number of buckets will reduce (and wasted buckets should get cleaned up over time)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to always produce good buckets in the end but to allow more and more "wasted" buckets as we end up cleaning up more and more. With the default as it stands when the agg is at the top level we don't clean up any of the "wasted" buckets until right before we return the aggs.

This bumping is more to defend against having large numbers of owningBucketOrds that cause us to have to rebucket many many times. To sort of blunt the O(n^2) nature of it all. We'd still to the O(n) rebucketing many times. Just fewer times as you accumulate more and more buckets.

I did think about making it a constant budget. And that works pretty well too because we naturally rebucket less and less as we promote rounding because the buckets are bigger. But if we have a ton of owningBucketOrds I'm kind of worried.

long count;
long min = Long.MAX_VALUE;
long max = Long.MIN_VALUE;
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have the min/max in bounds at this point? Could we use that to help skip to the correct rounding, so we don't have to potentially re-build the ords several times if max-min > the threshold?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. Let me think about that. My instinct is that this check here is largely symbolic because if the min and max would have pushed us up then we'd, well, already be on a bigger rounding anyway.

@nik9000
Copy link
Member Author

nik9000 commented May 29, 2020

After I dropping correctRounding the performance look much better in the multi-parent-bucket case. I'll have to check the top level case.

Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

@nik9000
Copy link
Member Author

nik9000 commented May 30, 2020

The performance when the agg is at the top level improved some as well after I dropped correctRounding, but it is still slower than it used to be. I'll see if I can track it down early next week.

@nik9000
Copy link
Member Author

nik9000 commented Jun 4, 2020

@talevy and @polyfractal as you can see from my commit history I had to fight this one quite a bit. I managed to keep the performance in the single-bucket case, but it was hard fought. Please have a look

@nik9000
Copy link
Member Author

nik9000 commented Jun 4, 2020

run elasticsearch-ci/1

@nik9000 nik9000 requested review from polyfractal and talevy June 11, 2020 17:31
@polyfractal
Copy link
Contributor

👀

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but nothing major, think this looks good. Big changes to existing behavior always scary to merge but I don't see anything obviously incorrect, so LGTM from my perspective! :)

* inner loop of {@link LeafBucketCollector#collect(int, long)}. You'd
* think that it wouldn't matter, but its seriously 7%-15% performance
* difference for the aggregation. Yikes.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, TIL.

* but it does have the advantage of only touching the buckets that we
* want to collect.
*/
rebucket();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we might need to guard here in the situation where the agg doesn't actually generate any buckets (none of the collected docs end up having the field, no collected docs, etc), and return an empty agg instead.

Or alternatively, put the guard in rebucket().

I think BucketsAggregator#mergeBuckets() would be fine since doc counts would also be empty... but I'm less confident about MergingBucketsDeferringCollector#mergeBuckets(). Might be ok though, only skimmed it :) We've ran into this kind of issue in the past though where buildAgg broke when the agg runs and is effectively "unmapped" after running, so thought I'd mention :)

* The number of times the aggregator had to {@link #rebucket()} the
* results. We keep this just to report to the profiler.
*/
private int rebucketCount = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing rebucketing info (count, rebucketAt, wasted, etc) via collectDebugInfo would be super interesting from a profiling perspective. Especially if a user comes to us with performance issues, we'd have a better idea if they are hitting some kind of antagonistic scenario or something

Not super important, just a thought :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already exposed it all! I made this in the first place because I wanted to see how the algorithm did with certain bits of data with the profiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I totally overlooked those. Ignore me! :)

@nik9000 nik9000 merged commit 7c7fe01 into elastic:master Jun 15, 2020
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 16, 2020
…ic#57304)

This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to elastic#56487
nik9000 added a commit that referenced this pull request Jun 17, 2020
… (#58190)

This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jun 26, 2020
I've always been confused by the strange behavior that I saw when
working on elastic#57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to elastic#56487
nik9000 added a commit that referenced this pull request Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to #56487
nik9000 added a commit to nik9000/elasticsearch that referenced this pull request Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on elastic#57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to elastic#56487
nik9000 added a commit that referenced this pull request Jul 13, 2020
I've always been confused by the strange behavior that I saw when
working on #57304. Specifically, I saw switching from a bimorphic
invocation to a monomorphic invocation to give us a 7%-15% performance
bump. This felt *bonkers* to me. And, it also made me wonder whether
it'd be worth looking into doing it everywhere.

It turns out that, no, it isn't needed everywhere. This benchmark shows
that a bimorphic invocation like:
```
LongKeyedBucketOrds ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

is 19% slower than a monomorphic invocation like:
```
LongKeyedBucketOrds.ForSingle ords = new LongKeyedBucketOrds.ForSingle();
ords.add(0, 0); <------ this line
```

But *only* when the reference is mutable. In the example above, if
`ords` is never changed then both perform the same. But if the `ords`
reference is assigned twice then we start to see the difference:
```
immutable bimorphic    avgt   10   6.468 ± 0.045  ns/op
immutable monomorphic  avgt   10   6.756 ± 0.026  ns/op
mutable   bimorphic    avgt   10   9.741 ± 0.073  ns/op
mutable   monomorphic  avgt   10   8.190 ± 0.016  ns/op
```

So the conclusion from all this is that we've done the right thing:
`auto_date_histogram` is the only aggregation in which `ords` isn't final
and it is the only aggregation that forces monomorphic invocations. All
other aggregations use an immutable bimorphic invocation. Which is fine.

Relates to #56487
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants