-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use a buffer to do character to byte conversion in StreamOutput#writeString #21680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…String Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes elastic#21660
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm.
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks. I don't mind doing it in this PR or a different one.
| final int bufferSize = Math.min(3 * charCount, 1024); // at most 3 bytes per character is needed here | ||
| if (convertStringBuffer.length < bufferSize) { | ||
| convertStringBuffer = new byte[ArrayUtil.oversize(bufferSize, Byte.BYTES)]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can replace the three above lines with just convertStringBuffer = ArrayUtil.grow(convertStringBuffer, bufferSize);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I had this before but there is no need to copy the array since we are trashing it that's why I used oversize?
| // make sure any possible char can fit into the buffer in any possible iteration | ||
| // we need at most 3 bytes so we flush the buffer once we have less than 3 bytes | ||
| // left before we start another iteration | ||
| if (offset >= buffer.length-3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be a strict greater than?
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also test an explicit big string that only contains chars that are stored on 3 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I don't think we can unless we change the way we write these strings to the wire. We only know how many characters we need to read but don't know how of how many bytes they are composed. I think we can down the road change the way we read strings but then we have to do it on both ends. makes sense? I mean we can read the min number of characters but this might make the loop quite complicated |
|
pushed some new commits @jpountz |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. LGTM.
|
I'm wondering if we could make a similar change for the other methods that are often used like |
in those methods the number of writeByte is constant I think unless we inline all the complexity it' won't help nor be sustainable |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Dears, I see, you react very fast on my ticket #21660. Thanks! Best regards, |
|
@s1monw: Thanks for the info. |
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
* master: (42 commits) Add support for merging custom meta data in tribe node (elastic#21552) [DOCS] Show EC2's auto attribute (elastic#21474) Add information about the removal of store throttling to the migration guide. Add a recommendation against large documents to the docs. (elastic#21652) Add indices options tests to search api REST tests (elastic#21701) Fixing indentation in geospatial querying example. (elastic#21682) Fix typo in filters aggregation docs (elastic#21690) Add BWC layer for Exceptions (elastic#21694) Add checkstyle rule to forbid empty javadoc comments (elastic#20881) Docs: Added offline install link for discovery-file plugin remove pointless catch exception in TransportSearchAction (elastic#21689) Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686) Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680) Fix integer overflows when dealing with templates. (elastic#21628) Fix highlighting on a stored keyword field (elastic#21645) Set execute permissions for native plugin programs (elastic#21657) adjust visibility of DiscoveryNodes.Delta constructor Remove unused DiscoveryNodes.Delta constructor Remove unused DiscoveryNode#removeDeadMembers public method Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes ...
|
@s1monw backporting looks safe, +1 to do it |
…String (#21680) Today we call `writeByte` up to 3x per character in each string written via `StreamOutput#writeString` this can have quite some overhead when strings are long or many strings are written. This change adds a local buffer to convert chars to bytes into the local buffer. Converted bytes are then written via `writeBytes` instead reducing the overhead of this opertion. Closes #21660
|
@habdank I pushed this to 2.4 a second ago - this will be in |
|
Thanks for all the help! New patched 2.4.3 library works for us much better. |
|
@habdank happy to help |
Today we call
writeByteup to 3x per character in each string written viaStreamOutput#writeStringthis can have quite some overhead when stringsare long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via
writeBytesinstead reducing the overhead of this operation.Closes #21660