-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Description
This meta issue tracks the effort to extend circuit breaker memory tracking beyond the collect phase of aggregations. There are several existing issues related to this, which do a good job of describing the problem (and are linked below), but we need a place to track the tasks for fixing this. That's what this issue is for.
Plan
Currently (8.4) aggregations create an object per collected bucket, called InternalAggregation. These objects are stored in the QuerySearchResult which is responsible for serializing them from the data nodes back to the coordinator, and also for de-serializing them on coordinator side. Managing these objects is quite tricky, and does not provide good places to inject the circuit breaker logic.
Instead, we want to move to a dense representation, which would create one object per aggregator. These objects would be Releasable, and responsible for tracking both the post-collection data node side memory usage and the reduce time coordinating node memory usage. Obviously this involves a (big) change to the wire format used for QuerySearchResult. Doing this in a backwards compatible way is non trivial.
Tasks
- Improve testing for post collection memory access
- In particular, we need to be able to validate that we don't access something that was "released" at the end of the collect phase.
- Modernize cardinality agg tests #90114
- Refactor aggregator test case #90149
- Refactor aggs tests part 3 #90530
- Release aggregator context in tests #90540
- Use the AggTestConfig object in testCase #90699
- Release agg context in tests part 2 #90775
- Release agg context in tests part 3 #91598
- Netty Buffer Backed
BigArrays- For
BigIntArray(Big arrays sliced from nettey buffers (int) #89668) - For
BigLongArray(Big arrays sliced from netty buffers (long) #91641) - For
BigByteArray(Big arrays sliced from netty buffers (byte) #92706) - For
BigObjectArray - For
BigDoubleArray(Big arrays sliced from netty buffers (double) #90745) - For
BigFloatArray
- For
- Add in support for a new aggregations response format into
QuerySearchResult. We can lay the ground work for this without defining too much about the format itself- Add support for releasing aggregation big arrays in
QuerySearchResult(Expand the lifecycle of the AggregationContext #94023) - Write side mixed version support
- Add support for releasing aggregation big arrays in
- Data Node Local Reduce in Big Arrays
- New reduction logic (prototype: Dense aggs reduce prototype #95346)
- Since we don't have the object hierarchy of the
InternalAggregations to work with, we need a pure ordinals based reduction path. - Sorted Key Iterator for
LongKeyedBucketOrds(Long key bucket ords key iterator #95809) - Sorted Key Iterator for
BytesKeyedBucketOrds - Migrate those iterators to be backed by BigArrays
- Use Primitive Iterators
- Since we don't have the object hierarchy of the
- Wire up the reduce side circuit breakers, make sure new reduce time big arrays are correctly released
-
AggregatorTestCasecoverage of the new reduce logic, some how - Reduce phase circuit breaker tests, verify that
MockBigArraysinAggregatorTestCaseis correctly catching leaks in the new reduce path - Reduce with Cranky Circuit Breaker tests
- New reduction logic (prototype: Dense aggs reduce prototype #95346)
- Dense Wire Format
-
LongKeyedBucketOrdsand friends need to becomeWritable, same as we did withBigArrays(and probably leaning on the buffer backed big arrays) - Reduce side version detection (i.e. downgrade some results if we got mixed old and new format
- CCS version detection, re-writing as appropriate
-
Vague Tasks
- Prototype a dense representation that addresses the memory concerns. Initially this can be done with the non-recycling big arrays instance, but long term needs to be wired up to a real circuit breaker.
- Works for
Max - Works for
Range - Works for
Cardinality - Works for
Terms
- Works for
- Make sure the prototype dense representation also solves the normalization concerns within terms
- Validate prototype (many reviews)
- BWC, Mixed Mode, CCS testing, testing, testing
- convert the rest of the aggs