-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow to sort by script value using SemVer semantics #85990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to sort by script value using SemVer semantics #85990
Conversation
|
|
reduce the impact of the change
|
Further reduced the impact of the changes, using |
|
@elasticmachine update branch |
…ipt_sort' into poc/custom_bytesref_script_sort
This is a decision that was made a while ago, maybe moving some (or even all) of the type implementation to server could be re-considered if this would make things easier or allow for better re-use for this PR. I haven't looked closely at the code yet but if you think this would help we can certainly discuss what changes are possible here. |
|
Thank you very much for your quick feedback @cbuescher For now, the only blocker I have regarding x-pack is the following:
The solution with $ is enough for my immediate needs (ie. the SQL fix), but it seems a half-baked solution to me, and since many users are used to Please take your time to review the code and ping me if you have some time to discuss the solution Thanks Luigi |
server/src/main/java/org/elasticsearch/search/sort/ScriptSortBuilder.java
Outdated
Show resolved
Hide resolved
| public class Version implements ToXContent, BytesRefProducer, Comparable<Version> { | ||
| protected String version; | ||
|
|
||
| public Version(String version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have another constructor that takes the encoded version, this String version is mostly for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a protected Version(String version, BytesRef ref) for internal use only, so that we don't have to recalculate the BytesRef? I guess having a constructor with BytesRef would have the same problem, ie. we would have to decode it for the string value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick look I think we can have a ctor that only takes the BytesRef and internally decodes it to a String only once, then use one for the xContent/toString part and the other for comparison. Version only seems to be created from VersionStringDocValuesField where we can also directly get the bytes, in fact currently we do the decoding step there once in getInternal, that could happen in the ctor instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need the Version(String) constructor, which is whitelisted for painless users. Users have to have the ability to provide a Version as a default value when fetching version fields, as in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm keeping both (see last commit)
and add test cases for script sort
|
With my last two commits, all the cases seem to be covered. I added some test cases, but I'm sure there is much more to test. @stu-elastic @cbuescher any pointers would be much appreciated, in particular about BWC, as the addition of the new I don't like very much the way I get the DocValueFormat for "version", but I didn't find a better way, without moving |
|
@elasticmachine update branch |
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick pass on this and have to admit I don't fully understand the details of setting up and wiring a new ScriptSortType but if making "Version" comparable gets the deal done, I think that approach works well even without having to move the VersionEncoder or other stuff to the server module. If that would make things easier on your side that's certainly something we can consider, I'd have to find out who to talk to for a move like that though because it would probably involve some slight license change on the code involved.
I left a few minor comments. Maybe it would be possible to use some of the ordering tests from VersionEncoderTests to check that the Version classes compare method works as expected. Maybe just use the cases from testEncodingOrderingSemver() because I think they cover most of the interesting cases.
| }; | ||
| } | ||
|
|
||
| private Version getVersionFromInternal(int index) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed if we create the Version from a BytesRef directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it makes sense, we can do the decoding in the ctor directly. We won't need getVersionFromInternal.
Fixing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started to do this change and I realised one thing that is worth pointing out: Version(String) constructor is public API (see last Painless test in 20_scripts.yml) and it has to encode the string to get the BytesRef. So if we decide to move Version to Server module, we will have to move VersionEncoder as well.
|
|
||
| import org.apache.lucene.util.BytesRef; | ||
|
|
||
| public interface BytesRefProducer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans to use this interface on anything other than Version? Otherwise maybe the instance check this is used here could simple test for Version instead. Or is this here so you don't need to reference Version from within the server module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface is there only to decouple this fix from x-pack, I'd be happy to remove it if we move Verson to server module.
Just for completeness, we could use this interface (and the same fix) for at least another case, that is IP field type, but the interface would still be overkill (an if/else would be more than enough probably)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thats what I guessed. No worries, I think if moving complicates things right now the interface is fine. Maybe this could benefit from a small class comment why its there then, so the next person doesn't ask again.
- refactor Version ctor - add test cases for sorting - code cleanup
…ipt_sort' into poc/custom_bytesref_script_sort
We want to move users towards the fields API (the |
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Hi @luigidellaquila, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search (Team:Search) |
|
|
||
| // decouples this module from org.elasticsearch.xpack.versionfield.Version field type | ||
| // that is defined in x-pack | ||
| public interface BytesRefProducer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a line in the Javadoc about how this class is used.
|
Pinging @elastic/clients-team (Team:Clients) |
No problem for me, I can use the field API from SQL, so it's not a blocker. |
|
|
||
|
|
||
| --- | ||
| "Sort by Version script value (asc)": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail on other versions, you'll have to move this into a new yaml test with the setup
- skip:
version: " - 8.2.99"
reason: "version script field sorting support was added in 8.3.0"
- remove support for sorting based on doc['versionfield'].value - disable version sort tests on previous ES versions - added comments
…ipt_sort' into poc/custom_bytesref_script_sort
|
@stu-elastic I removed the support for |
stu-elastic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the doc-values version back? After reviewing the removal, it doesn't clean much up and I'm sure users would appreciate it.
The changes on the scripting side look good.
| * See: https://github.com/elastic/elasticsearch/issues/82287 | ||
| */ | ||
| public class Version implements ToXContent { | ||
| public class Version implements ToXContent, BytesRefProducer, Comparable<Version> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, unrelated to this change actually, but could you change this class to implement ToXContentFragment and remove the isFragment method? We use ToXContentFragment as a marker interface for things that don't output full json object, and in this case I think the class itself only outputs a value to the xcontent builder.
- re-enable support for doc[field].value syntax - let Version implement ToXContentFragment - add tests
|
@stu-elastic @cbuescher thank you very much for your support. @sethmlarson I see you labeled this PR for |
sethmlarson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good from a clients perspective, no comments from me. I have a script that adds Team:Clients to PRs that change API specs or YAML tests so we can be alerted of upcoming changes.
Context:
In SQL and EQL we are trying to add support for VERSION fields, see #85502
But
ORDER BY <a calculated value>does not work properly when the calculated value is of type Version, since the calculation is translated to a Painless script.We tracked down the problem to the following issue: #85989
We identified the following root problems:
sort _scriptonly acceptsstringandnumberas type; both are not a good fit for VERSION fieldsScriptSortTypeto handle generic BytesRef values (that we can obtain from VERSION, allowing us to have the right sorting) does not help, since Version BytesRef is not a valid UTF8 string, so it cannot be properly formatted in the resultDocValueFormat.RAWformatter, that cannot be configured in the querymapper-versionis an x-pack plugin, so we would probably not move the version parsing/formatting logic to ServerFixes #85989
Fixes #82287