-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Avoid BytesRef's copying in ScriptDocValues's Strings #29581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit refactors ScriptDocValues.Strings to directly creates String objects instead of using an intermediate BytesRef's copy. ScriptDocValues.Binary is also changed to create a single copy of BytesRef per consumed value. Relates elastic#29567
|
Pinging @elastic/es-search-aggs |
|
We might want to benchmark this change. It does indeed remove one memcpy, but it also makes the String object allocation impossible to skip with escape analysis. I don't know whether escape analysis did succeed to skip the object allocation before, but if it did then this change would introduce a number of object allocations that is linear with the number of matches in the index? Should we also look into making the utf8 conversion lazy so that scripts that only get the string value based on some other condition only pay the price for the utf8 conversion when the string is actually used? |
That sounds like the solution we have today and maybe a good reason to not proceed with this pr ? |
|
This specific issue might be fixable differently, eg. by reading doc values lazily, only if at least one value is requested? |
|
I pushed a commit to copy the values lazily, there's an |
|
@jimczi was there any result from benchmarking yet? Is this still something we would like to get merged? |
|
I don't have time to work on this currently so I am closing this pr and will revisit later. |
This commit refactors ScriptDocValues.Strings to directly creates String objects
instead of using an intermediate BytesRef's copy.
ScriptDocValues.Binary is also changed to create a single copy of BytesRef per consumed value.
Relates #29567