Big arrays sliced from netty buffers (double) #90745

martijnvg · 2022-10-10T07:07:35Z

Based on #89668 but for doubles. This should allow aggregations down the road to read doubles values
directly from netty buffer, rather than copying it from the netty buffer.

Relates to #89437

Based on elastic#89668 but for doubles. This should allow aggregations down the road to read doubles values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437

nik9000 · 2022-10-10T19:15:56Z

server/src/main/java/org/elasticsearch/common/util/BigDoubleArray.java


+    static {
+        if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
+            throw new Error("The deserialization assumes this class is written with little-endian ints.");


s/ints/numbers/ I guess.

yes, this was a copy paste error...

nik9000 · 2022-10-10T19:16:57Z

server/src/main/java/org/elasticsearch/common/util/BigDoubleArray.java

+        for (int i = 0; i < pages.length - 1; i++) {
+            out.write(pages[i]);
+        }
+        out.write(pages[pages.length - 1], 0, lastPageEnd * Double.BYTES);


I wonder if we should share this code with the int/long/whatever other types. It's nearly the same code. 🤷 sounds like a good follow up change.

nik9000 · 2022-10-10T19:18:47Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+    public void testGetDoubleLE() {
+        // first bytes array = 1.2, second bytes array = 1.4, third bytes array = 1.6
+        BytesReference[] refs = new BytesReference[] {
+            new BytesArray(new byte[] { 0x33, 0x33, 0x33, 0x33, 0x33, 0x33, -0xD, 0x3F }),


Maybe we should make one big byte array and then randomly break it into smaller arrays?

pushed: 7f9f084

… to reuse ArrayIndexOutOfBoundsException exception.

martijnvg · 2022-10-11T09:12:03Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+        // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception,
+        // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case.
+        // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task.
+        expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5));


The additional the testGetDoubleLE() test sometimes causes the jvm to reuse the same AIOOB exception. These reused exceptions have no message and no stacktrace.

This reproduces when running:

./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8

And also failed in PR CI.

Running with OmitStackTraceInFastThrow disabled (is enabled by default) stops the exception reuse and test case doesn't fail without this modification:

./gradlew ':server:test' --tests "org.elasticsearch.common.bytes.CompositeBytesReferenceTests" -Dtests.iters=8 -Dtests.jvm.argline="-XX:-OmitStackTraceInFastThrow"

We either need to ensure -XX:-OmitStackTraceInFastThrow is set when running gradle test task or adjust the test, which I have done now. I don't think asserting the exception message is that important?

I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.

nik9000 · 2022-10-11T14:26:59Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+            -0x67,
+            -0x67,
+            -0x7,
+            0x3F };


I wonder if you could do:

byte[] data = new byte[3 * Double.BYTES]; ByteUtils.writeDoubleLE(data, 1.2, 0); ByteUtils.writeDoubleLE(data, 1.4, Double.BYTES); ByteUtils.writeDoubleLE(data, 1.6, 2 * Double.BYTES);

Would that be more readable? I'm not really sure.

nik9000 · 2022-10-11T14:27:22Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+            int length = Math.min(bytesPerChunk, data.length - offset);
+            refs.add(new BytesArray(data, offset, length));
+        }
+        BytesReference comp = CompositeBytesReference.of(refs.toArray(BytesReference[]::new));


nik9000 · 2022-10-11T14:29:22Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+        // The jvm can optimize throwing ArrayIndexOutOfBoundsException by reusing the same exception,
+        // but these reused exceptions have no message or stack trace. This sometimes happens when running this test case.
+        // We can assert the exception message if -XX:-OmitStackTraceInFastThrow is set in gradle test task.
+        expectThrows(ArrayIndexOutOfBoundsException.class, () -> comp.getIntLE(5));


I don't think it is, nah. For what it's worth, painless does set this because it really does want to assert messages. And we set it in production because it can make debugging some issues impossible.

not-napoleon · 2022-10-11T14:03:51Z

server/src/main/java/org/elasticsearch/common/bytes/AbstractBytesReference.java

+    public double getDoubleLE(int index) {
+        long bits = (long) (get(index + 7) & 0xFF) << 56 | (long) (get(index + 6) & 0xFF) << 48 | (long) (get(index + 5) & 0xFF) << 40
+            | (long) (get(index + 4) & 0xFF) << 32 | (long) (get(index + 3) & 0xFF) << 24 | (get(index + 2) & 0xFF) << 16 | (get(index + 1)
+                & 0xFF) << 8 | get(index) & 0xFF;


I don't know if we want to do it in this PR or in a follow up, but we'll want this same bit twiddling for the long version, and might as well make this reusable for that case.

👍 I will add getLongLE method to this class and the interface and a unit test for it.

pushed: 94c75f3

not-napoleon · 2022-10-11T14:28:43Z

server/src/test/java/org/elasticsearch/common/bytes/CompositeBytesReferenceTests.java

+            -0x67,
+            -0x67,
+            -0x7,
+            0x3F };


This seems like a good use case for the // tag::noformat - // end::noformat syntax to disable automatic formatting for a section. See the note in CONTRIBUTING.md. Personally, I would put the bytes for each double on one line.

not-napoleon · 2022-10-11T14:32:51Z

server/src/main/java/org/elasticsearch/common/util/ReleasableDoubleArray.java

+
+    @Override
+    public double get(long index) {
+        if (index > Integer.MAX_VALUE / 8) {


Nit, but...

Suggested change

if (index > Integer.MAX_VALUE / 8) {

if (index > Integer.MAX_VALUE / Long.BYTES) {

not-napoleon · 2022-10-11T14:33:41Z

server/src/main/java/org/elasticsearch/common/util/ReleasableDoubleArray.java

+
+    @Override
+    public long size() {
+        return ref.length() / 8;


Suggested change

return ref.length() / 8;

return ref.length() / Long.BYTES;

not-napoleon · 2022-10-11T14:33:55Z

server/src/main/java/org/elasticsearch/common/util/ReleasableDoubleArray.java

+            // We can't serialize messages longer than 2gb anyway
+            throw new ArrayIndexOutOfBoundsException();
+        }
+        return ref.getDoubleLE((int) index * 8);


Suggested change

return ref.getDoubleLE((int) index * 8);

return ref.getDoubleLE((int) index * Long.BYTES);

not-napoleon · 2022-10-11T14:39:29Z

server/src/main/java/org/elasticsearch/common/util/ReleasableDoubleArray.java

+
+    @Override
+    public void close() {
+        ref.decRef();


Maybe this is obvious, but I'm not used to looking at this part of the code - where's the corresponding incRef for this?

You'll love this. It's in the ctor - in readReleasableBytesReference. The way this links into the rest of the world is that ReleasableBytesReference#streamInput returns a specialized input that incs the ref when you call readReleasableBytesReference.

elasticsearchmachine · 2022-10-11T17:40:14Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

original-brownbear

LGTM :) thanks Martijn! Just one nit about the endianness check ... but we can look into this whenever, just figured I'd point it out.

original-brownbear · 2022-10-12T10:46:06Z

server/src/main/java/org/elasticsearch/common/util/BigDoubleArray.java

 */
 final class BigDoubleArray extends AbstractBigArray implements DoubleArray {

+    static {


Do we actually need this here as well? Maybe we can just make a static method for this somewhere at least since we really only use it once? Or put this in a bootstrap check? Seems strange to duplicate this check doesn't it?

Or put this in a bootstrap check?

I like this idea. I will attempt this in a followup pr.

I've opened this pr for this change: #91801

not-napoleon

LGTM, thank you for taking this!

Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437

Move little endian byte order checks to a single bootstrap check. Originated from elastic#90745

Move little endian byte order checks to a single bootstrap check. Originated from #90745

Based on #90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to #89437

Based on elastic#90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437

Based on #90745 but for longs. This should allow aggregations down the road to read long values directly from netty buffer, rather than copying it from the netty buffer. Relates to #89437

Big arrays sliced from netty buffers (double)

6ac0d35

Based on elastic#89668 but for doubles. This should allow aggregations down the road to read doubles values directly from netty buffer, rather than copying it from the netty buffer. Relates to elastic#89437

elasticsearchmachine added the v8.6.0 label Oct 10, 2022

martijnvg mentioned this pull request Oct 10, 2022

Enable Circuit Breaker tracking in more parts of the aggregations framework #89437

Open

34 tasks

nik9000 reviewed Oct 10, 2022

View reviewed changes

martijnvg added 2 commits October 11, 2022 00:05

iter

b5ecbe3

The addition of the testGetDoubleLE() test sometimes triggers the jvm…

9b07968

… to reuse ArrayIndexOutOfBoundsException exception.

martijnvg commented Oct 11, 2022

View reviewed changes

randomly break big byte array

7f9f084

nik9000 approved these changes Oct 11, 2022

View reviewed changes

not-napoleon reviewed Oct 11, 2022

View reviewed changes

iter

d77167e

martijnvg added >non-issue :Analytics/Aggregations Aggregations labels Oct 11, 2022

martijnvg added 2 commits October 11, 2022 19:38

added getLongLE() method

94c75f3

array formatting

8c41e92

martijnvg marked this pull request as ready for review October 11, 2022 17:39

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 11, 2022

martijnvg added 2 commits October 12, 2022 09:36

Merge remote-tracking branch 'es/main' into netty_double_array

ae57e3f

spotless

4c144f4

martijnvg requested review from not-napoleon and original-brownbear October 12, 2022 08:47

original-brownbear approved these changes Oct 12, 2022

View reviewed changes

Merge remote-tracking branch 'es/main' into netty_double_array

cb2887c

not-napoleon approved these changes Oct 14, 2022

View reviewed changes

Merge remote-tracking branch 'es/main' into netty_double_array

d54ee6b

martijnvg added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 14, 2022

elasticsearchmachine merged commit d19603d into elastic:main Oct 14, 2022

martijnvg deleted the netty_double_array branch October 14, 2022 15:30

martijnvg mentioned this pull request Nov 17, 2022

Big arrays sliced from netty buffers (long) #91641

Merged

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Nov 22, 2022

Add byte order bootstrap check

8d19158

Move little endian byte order checks to a single bootstrap check. Originated from elastic#90745

martijnvg mentioned this pull request Nov 22, 2022

Add byte order bootstrap check #91801

Merged

martijnvg added a commit that referenced this pull request Nov 22, 2022

Add byte order bootstrap check (#91801)

5b62b1f

Move little endian byte order checks to a single bootstrap check. Originated from #90745

martijnvg mentioned this pull request Jan 5, 2023

Big arrays sliced from netty buffers (byte) #92706

Merged

+                          -0x67,
+                          -0x67,
+                          -0x7,
+x3F };

	if (index > Integer.MAX_VALUE / 8) {
	if (index > Integer.MAX_VALUE / Long.BYTES) {

	return ref.getDoubleLE((int) index * 8);
	return ref.getDoubleLE((int) index * Long.BYTES);

Big arrays sliced from netty buffers (double) #90745

Big arrays sliced from netty buffers (double) #90745

Uh oh!

Conversation

martijnvg commented Oct 10, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Oct 11, 2022

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

not-napoleon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants