-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Dears,
We suppose, we found a performance problem in the elasticsearch code.
Elasticsearch version: 2.4.1 but also master branch
Plugins installed: [] Shield, but it has nothing to do with the problem
JVM version: 1.8.112
OS version: Windows, Linux
Description of the problem including expected versus actual behavior:
We had spend some time on testing Elasticsearch performance.
We think, we found a problem in the JAVA API (client) for the Elasticsearch.
We performed tests making regular logging (with time stamps) as well as running software with JProfiler.
The performace problem is located in the elasticsearch/core/src/main/java/org/elasticsearch/common/io/stream/StreamOutput.java in the method: public void writeString(String str)
And we suppose the problem is that writeString uses in the loop writeByte method (elasticsearch/core/src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java)
which uses ensureCapacity method (elasticsearch/core/src/main/java/org/elasticsearch/common/io/stream/BytesStreamOutput.java ),
which calls grow method (elasticsearch/core/src/main/java/org/elasticsearch/common/util/BigArrays.java).
The regular usage of the writeByte has no problem with ensure capacity and grow. But ...
JProfiler shows us, when it comes to call writeString the method call chain is higly inefficient,
because writeString for every character in the string calls all mentioned methods to write single byte, which needs to ensure capacity and grows Arrays - byte by byte.
This is made for every document, which send to Elasticsearch, which is normally not a short string.
In this operation there are many objects copied and there are executed many memory reallocations in the loop for single bytes.
We also made code review in the master brach, and it confirms what JProfiler states. The writeString call tree is exactly using those heavy methods for each and every byte.
Describe the feature:
The proposal would be to modify writeString method not to use writeByte foreach and every byte,
but to do the following:
- Check the needed size of the string/array/buffer (once)
- Allocate memory for the needed size (once)
- Perform "unsafe" copy of the string (once) - it is no more unsafe, as memory is already provided.
- Execute special modifications on characters in the already copied string, which are done in writeString method - this might be done in a kind of a loop.
What do you think of it?
We are testing now a kind of simple check if this proposal is helping us,
but we do not know very good your codebase, so we cannot provide complete bug fix,
which will be widely tested.
Thanks in advace for any input to this problem.
Best regards,
Seweryn.
