Skip to content

Conversation

@martijnvg
Copy link
Member

Based on #89668 but for doubles. This should allow aggregations down the road to read doubles values
directly from netty buffer, rather than copying it from the netty buffer.

Relates to #89437

Based on elastic#89668 but for doubles. This should allow
aggregations down the road to read doubles values
directly from netty buffer, rather than copying
it from the netty buffer.

Relates to elastic#89437

static {
if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
throw new Error("The deserialization assumes this class is written with little-endian ints.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ints/numbers/ I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this was a copy paste error...

for (int i = 0; i < pages.length - 1; i++) {
out.write(pages[i]);
}
out.write(pages[pages.length - 1], 0, lastPageEnd * Double.BYTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should share this code with the int/long/whatever other types. It's nearly the same code. 🤷 sounds like a good follow up change.

public void testGetDoubleLE() {
// first bytes array = 1.2, second bytes array = 1.4, third bytes array = 1.6
BytesReference[] refs = new BytesReference[] {
new BytesArray(new byte[] { 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, -0xD, 0x3F }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make one big byte array and then randomly break it into smaller arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: 7f9f084

// The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception,
// but these reused exceptions have no message or stack trace. This sometimes happens when running this test case.
// We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task.
expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional the testGetDoubleLE() test sometimes causes the jvm to reuse the same AIOOB exception. These reused exceptions have no message and no stacktrace.

This reproduces when running:

./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8

And also failed in PR CI.

Running with OmitStackTraceInFastThrow disabled (is enabled by default) stops the exception reuse and test case doesn't fail without this modification:

./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8 -Dtests.jvm.argline="-XX:-OmitStackTraceInFastThrow"

We either need to ensure -XX:-OmitStackTraceInFastThrow is set when running gradle test task or adjust the test, which I have done now. I don't think asserting the exception message is that important?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.

-0x67,
-0x67,
-0x7,
0x3F };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could do:

byte[] data = new byte[3 * Double.BYTES];
ByteUtils.writeDoubleLE(data, 1.2, 0);
ByteUtils.writeDoubleLE(data, 1.4, Double.BYTES);
ByteUtils.writeDoubleLE(data, 1.6, 2 * Double.BYTES);

Would that be more readable? I'm not really sure.

int length = Math.min(bytesPerChunk, data.length - offset);
refs.add(new BytesArray(data, offset, length));
}
BytesReference comp = CompositeBytesReference.of(refs.toArray(BytesReference[]::new));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception,
// but these reused exceptions have no message or stack trace. This sometimes happens when running this test case.
// We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task.
expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.

public double getDoubleLE(int index) {
long bits = (long) (get(index + 7) & 0xFF) << 56 | (long) (get(index + 6) & 0xFF) << 48 | (long) (get(index + 5) & 0xFF) << 40
| (long) (get(index + 4) & 0xFF) << 32 | (long) (get(index + 3) & 0xFF) << 24 | (get(index + 2) & 0xFF) << 16 | (get(index + 1)
& 0xFF) << 8 | get(index) & 0xFF;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to do it in this PR or in a follow up, but we'll want this same bit twiddling for the long version, and might as well make this reusable for that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I will add getLongLE method to this class and the interface and a unit test for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed: 94c75f3

-0x67,
-0x67,
-0x7,
0x3F };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good use case for the // tag::noformat - // end::noformat syntax to disable automatic formatting for a section. See the note in CONTRIBUTING.md. Personally, I would put the bytes for each double on one line.


@Override
public double get(long index) {
if (index > Integer.MAX_VALUE / 8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but...

Suggested change
if (index > Integer.MAX_VALUE / 8) {
if (index > Integer.MAX_VALUE / Long.BYTES) {


@Override
public long size() {
return ref.length() / 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ref.length() / 8;
return ref.length() / Long.BYTES;

// We can't serialize messages longer than 2gb anyway
throw new ArrayIndexOutOfBoundsException();
}
return ref.getDoubleLE((int) index * 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ref.getDoubleLE((int) index * 8);
return ref.getDoubleLE((int) index * Long.BYTES);


@Override
public void close() {
ref.decRef();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is obvious, but I'm not used to looking at this part of the code - where's the corresponding incRef for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll love this. It's in the ctor - in readReleasableBytesReference. The way this links into the rest of the world is that ReleasableBytesReference#streamInput returns a specialized input that incs the ref when you call readReleasableBytesReference.

@martijnvg martijnvg marked this pull request as ready for review October 11, 2022 17:39
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 11, 2022
Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :) thanks Martijn! Just one nit about the endianness check ... but we can look into this whenever, just figured I'd point it out.

*/
final class BigDoubleArray extends AbstractBigArray implements DoubleArray {

static {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this here as well? Maybe we can just make a static method for this somewhere at least since we really only use it once? Or put this in a bootstrap check? Seems strange to duplicate this check doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or put this in a bootstrap check?

I like this idea. I will attempt this in a followup pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened this pr for this change: #91801

Copy link
Member

@not-napoleon not-napoleon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for taking this!

@martijnvg martijnvg added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 14, 2022
@elasticsearchmachine elasticsearchmachine merged commit d19603d into elastic:main Oct 14, 2022
@martijnvg martijnvg deleted the netty_double_array branch October 14, 2022 15:30
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 17, 2022
Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values
directly from netty buffer, rather than copying it from the netty buffer.

Relates to elastic#89437
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 22, 2022
Move little endian byte order checks to a single bootstrap check.

Originated from elastic#90745
martijnvg added a commit that referenced this pull request Nov 22, 2022
Move little endian byte order checks to a single bootstrap check.

Originated from #90745
elasticsearchmachine pushed a commit that referenced this pull request Nov 23, 2022
Based on #90745 but for longs. This should allow aggregations down the
road to read long values directly from netty buffer, rather than copying
it from the netty buffer.

Relates to #89437
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Jan 5, 2023
Based on elastic#90745 but for longs.
This should allow aggregations down the road to read long values
directly from netty buffer, rather than copying it from the netty buffer.

Relates to elastic#89437
elasticsearchmachine pushed a commit that referenced this pull request Jan 16, 2023
Based on #90745 but for longs. This should allow aggregations down the
road to read long values directly from netty buffer, rather than copying
it from the netty buffer.

Relates to #89437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Aggregations Aggregations auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants