-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Big arrays sliced from nettey buffers (int) #89668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds `writeBigIntArray` and `readBigIntArray` to our serialization. The interesting bit here is that reading slices the reference to the underlying buffer rather than copying. That reference can be retained as long as it's needed, holding the underlying buffer open until the `IntArray` is `close`d. This should allow aggregations to send dense representations between nodes with one fewer copy operation.
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked IntArray here because BytesReference has a getInt method. It looks like that method does some bit shifting. We could probably get away with avoiding the bit shifting most of the time, but that seems like a problem for another day.
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java
Outdated
Show resolved
Hide resolved
|
Hi @nik9000, I've created a changelog YAML for you. |
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
|
Pinging @elastic/es-distributed (Team:Distributed) |
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, this is a step in the larger aggregations memory story we're working on currently. We are migrating away from large collections of small InternalAggregation objects, which each managed a single bucket, in favor of fewer, larger objects which each manage an entire node's result set for a given aggregation. These objects will be backed by BigArrays, which will be built on the data nodes and deserialized on the coordinating nodes. Something along the lines of this PR will let us avoid copying huge blocks of data when doing that deserialization.
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to think about size here - the arrays are often oversized but we will have a precise size when we're ready to write. If we write the oversized array it could be twice as big.
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nik this is a really cool idea, added a couple comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no particular reason really and I agree that this might not be optimal. We could make it increment the ref (and decrement on close) and I bet we could use that to clean up quite a bit of code but all the existing code is written under the assumption that all the lifecycle lives with the bytes reference and that's a bigger task to change without introducing leaks I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I don't want to change this as part of this PR, but it sure confused the hell out of me.
server/src/main/java/org/elasticsearch/common/bytes/ReleasableIntArray.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems needlessly expensive when we already have the byte representation for the implementation above. Should we maybe add a writeTo to IntArray (i.e. make it writable) instead to be able to leverage those bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah!
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've had a bit of a think about this and wonder if we're not better of making a new interface - one that just reads. And to have two implementations of that interface - one that wraps our IntArray from BigArrays and one that wraps the netty buffer. But, like, that's around the edges. It sounds like folks think this is generally a good idea. I'll try and iterate a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah!
a97eea9 to
a281c9d
Compare
|
@original-brownbear could you have another look at this? One thing that's really interesting is endianness. BigArrays always runs in platform-native endianness which is usually little endian. I don't think we test on any big endian platforms though. |
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nik, I added a few comments but this looks really nice. I commented on endianess inline. The size limitation to 2G is a non-issue at the moment since we don't allow transport messages larger than 2G anyway IMO.
| } | ||
| int end = intSize % INT_PAGE_SIZE; | ||
| out.write(pages[pages.length - 1], 0, end * Integer.BYTES); | ||
| // NOCOMMIT endian-ness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to your suggestion of adding a fallback for when the system byte order isn't LE if we have to.
But you do make a good point here, we aren't testing or supporting any BE platforms right now as far as I understand. Couldn't we just add a check to node startup that we're not on a BE platform and deal with the issue that way, saving some complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked around and we don't support any big endian platforms at all. I think it's enough to fail to start if we are on such a system and to comment about it.
| @Override | ||
| public int get(long index) { | ||
| if (index > Integer.MAX_VALUE / 4) { | ||
| throw new UnsupportedOperationException(); // NOCOMMIT Oh god, what do we do here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here in this particular line of code it's probably a non-issue since we only read this from the wire and we'll never run into a case where this kind of index isn't out of bounds. Maybe just throw an out of bounds exception here?
| } | ||
|
|
||
| @Override | ||
| public int getIntLE(int index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add an implementation (using var handles as we did for other stuff in Numbers) of this to BytesArray, otherwise this might be quite slow if it's not getting inlined properly (which it probably won't be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. We have stuff in ByteUtils for this sort of thing I think. I figured I'd do it in a followup and add a microbenchmark. I could do it now if you'd prefer too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up seems fine by me, especially if it gets a benchmark, thanks! :)
|
I've brought this locally and fixed somethings up. Is it ready to get in so we can build on it? |
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Nik!
server/src/main/java/org/elasticsearch/common/util/BigIntArray.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/util/ReleasableIntArray.java
Outdated
Show resolved
Hide resolved
This fixed a bug with the `BigIntArray` serialization that I wrote in elastic#89668 where it'd skip the entire final block if we used all of it.
This fixed a bug with the `BigIntArray` serialization that I wrote in #89668 where it'd skip the entire final block if we used all of it.
Based on elastic#89668 but for doubles. This should allow aggregations down the road to read doubles values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
This teaches
IntArrayabout our serialization.The interesting bit here is that reading slices the reference to the
underlying buffer rather than copying. That reference can be retained as
long as it's needed, holding the underlying buffer open until the
IntArrayisclosed.This should allow aggregations to send dense representations between
nodes with one fewer copy operation.