-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Big arrays sliced from netty buffers (double) #90745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big arrays sliced from netty buffers (double) #90745
Conversation
Based on elastic#89668 but for doubles. This should allow aggregations down the road to read doubles values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
|
|
||
| static { | ||
| if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) { | ||
| throw new Error("The deserialization assumes this class is written with little-endian ints."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/ints/numbers/ I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was a copy paste error...
| for (int i = 0; i < pages.length - 1; i++) { | ||
| out.write(pages[i]); | ||
| } | ||
| out.write(pages[pages.length - 1], 0, lastPageEnd * Double.BYTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should share this code with the int/long/whatever other types. It's nearly the same code. 🤷 sounds like a good follow up change.
| public void testGetDoubleLE() { | ||
| // first bytes array = 1.2, second bytes array = 1.4, third bytes array = 1.6 | ||
| BytesReference[] refs = new BytesReference[] { | ||
| new BytesArray(new byte[] { 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, -0xD, 0x3F }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should make one big byte array and then randomly break it into smaller arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed: 7f9f084
… to reuse ArrayIndexOutOfBoundsException exception.
| // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception, | ||
| // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case. | ||
| // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task. | ||
| expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional the testGetDoubleLE() test sometimes causes the jvm to reuse the same AIOOB exception. These reused exceptions have no message and no stacktrace.
This reproduces when running:
./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8
And also failed in PR CI.
Running with OmitStackTraceInFastThrow disabled (is enabled by default) stops the exception reuse and test case doesn't fail without this modification:
./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8 -Dtests.jvm.argline="-XX:-OmitStackTraceInFastThrow"
We either need to ensure -XX:-OmitStackTraceInFastThrow is set when running gradle test task or adjust the test, which I have done now. I don't think asserting the exception message is that important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.
| -0x67, | ||
| -0x67, | ||
| -0x7, | ||
| 0x3F }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you could do:
byte[] data = new byte[3 * Double.BYTES];
ByteUtils.writeDoubleLE(data, 1.2, 0);
ByteUtils.writeDoubleLE(data, 1.4, Double.BYTES);
ByteUtils.writeDoubleLE(data, 1.6, 2 * Double.BYTES);
Would that be more readable? I'm not really sure.
| int length = Math.min(bytesPerChunk, data.length - offset); | ||
| refs.add(new BytesArray(data, offset, length)); | ||
| } | ||
| BytesReference comp = CompositeBytesReference.of(refs.toArray(BytesReference[]::new)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception, | ||
| // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case. | ||
| // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task. | ||
| expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.
| public double getDoubleLE(int index) { | ||
| long bits = (long) (get(index + 7) & 0xFF) << 56 | (long) (get(index + 6) & 0xFF) << 48 | (long) (get(index + 5) & 0xFF) << 40 | ||
| | (long) (get(index + 4) & 0xFF) << 32 | (long) (get(index + 3) & 0xFF) << 24 | (get(index + 2) & 0xFF) << 16 | (get(index + 1) | ||
| & 0xFF) << 8 | get(index) & 0xFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to do it in this PR or in a follow up, but we'll want this same bit twiddling for the long version, and might as well make this reusable for that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I will add getLongLE method to this class and the interface and a unit test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed: 94c75f3
| -0x67, | ||
| -0x67, | ||
| -0x7, | ||
| 0x3F }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good use case for the // tag::noformat - // end::noformat syntax to disable automatic formatting for a section. See the note in CONTRIBUTING.md. Personally, I would put the bytes for each double on one line.
|
|
||
| @Override | ||
| public double get(long index) { | ||
| if (index > Integer.MAX_VALUE / 8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but...
| if (index > Integer.MAX_VALUE / 8) { | |
| if (index > Integer.MAX_VALUE / Long.BYTES) { |
|
|
||
| @Override | ||
| public long size() { | ||
| return ref.length() / 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return ref.length() / 8; | |
| return ref.length() / Long.BYTES; |
| // We can't serialize messages longer than 2gb anyway | ||
| throw new ArrayIndexOutOfBoundsException(); | ||
| } | ||
| return ref.getDoubleLE((int) index * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return ref.getDoubleLE((int) index * 8); | |
| return ref.getDoubleLE((int) index * Long.BYTES); |
|
|
||
| @Override | ||
| public void close() { | ||
| ref.decRef(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is obvious, but I'm not used to looking at this part of the code - where's the corresponding incRef for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll love this. It's in the ctor - in readReleasableBytesReference. The way this links into the rest of the world is that ReleasableBytesReference#streamInput returns a specialized input that incs the ref when you call readReleasableBytesReference.
|
Pinging @elastic/es-analytics-geo (Team:Analytics) |
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :) thanks Martijn! Just one nit about the endianness check ... but we can look into this whenever, just figured I'd point it out.
| */ | ||
| final class BigDoubleArray extends AbstractBigArray implements DoubleArray { | ||
|
|
||
| static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this here as well? Maybe we can just make a static method for this somewhere at least since we really only use it once? Or put this in a bootstrap check? Seems strange to duplicate this check doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or put this in a bootstrap check?
I like this idea. I will attempt this in a followup pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened this pr for this change: #91801
not-napoleon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for taking this!
Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
Move little endian byte order checks to a single bootstrap check. Originated from elastic#90745
Move little endian byte order checks to a single bootstrap check. Originated from #90745
Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437
Based on #89668 but for doubles. This should allow aggregations down the road to read doubles values
directly from netty buffer, rather than copying it from the netty buffer.
Relates to #89437