Skip to content

Conversation

@s1monw
Copy link
Contributor

@s1monw s1monw commented Nov 19, 2016

Today we call writeByte up to 3x per character in each string written via
StreamOutput#writeString this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via writeBytes instead reducing the overhead of this operation.

Closes #21660

…String

Today we call `writeByte` up to 3x per character in each string written via
`StreamOutput#writeString` this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via `writeBytes` instead reducing the overhead of this opertion.

Closes elastic#21660
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks. I don't mind doing it in this PR or a different one.

final int bufferSize = Math.min(3 * charCount, 1024); // at most 3 bytes per character is needed here
if (convertStringBuffer.length < bufferSize) {
convertStringBuffer = new byte[ArrayUtil.oversize(bufferSize, Byte.BYTES)];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace the three above lines with just convertStringBuffer = ArrayUtil.grow(convertStringBuffer, bufferSize);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I had this before but there is no need to copy the array since we are trashing it that's why I used oversize?

// make sure any possible char can fit into the buffer in any possible iteration
// we need at most 3 bytes so we flush the buffer once we have less than 3 bytes
// left before we start another iteration
if (offset >= buffer.length-3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a strict greater than?

}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also test an explicit big string that only contains chars that are stored on 3 bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@s1monw
Copy link
Contributor Author

s1monw commented Nov 21, 2016

Should we do something similar with the default impl of StreamInput.readString()? I see it calls readByte() while it could probably read larger chunks at once to avoid doing lots of bounds/eof checks.

I don't think we can unless we change the way we write these strings to the wire. We only know how many characters we need to read but don't know how of how many bytes they are composed. I think we can down the road change the way we read strings but then we have to do it on both ends. makes sense?

I mean we can read the min number of characters but this might make the loop quite complicated

@s1monw
Copy link
Contributor Author

s1monw commented Nov 21, 2016

pushed some new commits @jpountz

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. LGTM.

@tlrx
Copy link
Member

tlrx commented Nov 21, 2016

I'm wondering if we could make a similar change for the other methods that are often used like writeInt, writeVInt etc?

@s1monw
Copy link
Contributor Author

s1monw commented Nov 21, 2016

I'm wondering if we could make a similar change for the other methods that are often used like writeInt, writeVInt etc?

in those methods the number of writeByte is constant I think unless we inline all the complexity it' won't help nor be sustainable

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@habdank
Copy link

habdank commented Nov 21, 2016

Dears,

I see, you react very fast on my ticket #21660. Thanks!
I have one small question. Is it possible to get this fix as well in the 2.4.1 release?
If not may I get the patch separately, so I can patch the code my copy of the elasticsearch?
Thanks in advance.

Best regards,
Seweryn.

@s1monw
Copy link
Contributor Author

s1monw commented Nov 21, 2016

@jpountz I think we can backport this to 2.4.x WDYT?

@habdank we are discussing the backport, yet I don't know when the next release will happen though

@habdank
Copy link

habdank commented Nov 21, 2016

@s1monw: Thanks for the info.

@s1monw s1monw merged commit d913242 into elastic:master Nov 21, 2016
@s1monw s1monw deleted the issues/21660 branch November 21, 2016 09:47
s1monw added a commit that referenced this pull request Nov 21, 2016
…String (#21680)

Today we call `writeByte` up to 3x per character in each string written via
`StreamOutput#writeString` this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via `writeBytes` instead reducing the overhead of this opertion.

Closes #21660
s1monw added a commit that referenced this pull request Nov 21, 2016
…String (#21680)

Today we call `writeByte` up to 3x per character in each string written via
`StreamOutput#writeString` this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via `writeBytes` instead reducing the overhead of this opertion.

Closes #21660
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Nov 22, 2016
* master: (42 commits)
  Add support for merging custom meta data in tribe node (elastic#21552)
  [DOCS] Show EC2's auto attribute (elastic#21474)
  Add information about the removal of store throttling to the migration guide.
  Add a recommendation against large documents to the docs. (elastic#21652)
  Add indices options tests to search api REST tests (elastic#21701)
  Fixing indentation in geospatial querying example. (elastic#21682)
  Fix typo in filters aggregation docs (elastic#21690)
  Add BWC layer for Exceptions (elastic#21694)
  Add checkstyle rule to forbid empty javadoc comments (elastic#20881)
  Docs: Added offline install link for discovery-file plugin
  remove pointless catch exception in TransportSearchAction (elastic#21689)
  Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686)
  Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680)
  Fix integer overflows when dealing with templates. (elastic#21628)
  Fix highlighting on a stored keyword field (elastic#21645)
  Set execute permissions for native plugin programs (elastic#21657)
  adjust visibility of DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNode#removeDeadMembers public method
  Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes
  ...
@jpountz
Copy link
Contributor

jpountz commented Nov 22, 2016

@s1monw backporting looks safe, +1 to do it

@habdank
Copy link

habdank commented Nov 22, 2016

@s1monw and @jpountz

May I ask (a bit directly), when it could be done?
I really appreciate it, because I tried to port it myself, but I failed, because I do not know well elasticsearch code. So when it is done I will take it and test in our environment.

Thanks in advance and best regards.

s1monw added a commit that referenced this pull request Nov 22, 2016
…String (#21680)

Today we call `writeByte` up to 3x per character in each string written via
`StreamOutput#writeString` this can have quite some overhead when strings
are long or many strings are written. This change adds a local buffer to
convert chars to bytes into the local buffer. Converted bytes are then
written via `writeBytes` instead reducing the overhead of this opertion.

Closes #21660
@s1monw s1monw added the v2.4.3 label Nov 22, 2016
@s1monw
Copy link
Contributor Author

s1monw commented Nov 22, 2016

@habdank I pushed this to 2.4 a second ago - this will be in 2.4.3

@habdank
Copy link

habdank commented Nov 24, 2016

Thanks for all the help! New patched 2.4.3 library works for us much better.

@s1monw
Copy link
Contributor Author

s1monw commented Nov 24, 2016

@habdank happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants