-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Store the QueryBuilder's Writable representation instead of its XContent representation #25456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store the QueryBuilder's Writable representation instead of its XContent representation #25456
Conversation
|
I think this is a better representation indeed. In terms of backward compat, I think we have several options:
|
|
@jpountz Thanks for looking! I'll label the pr with discuss label to discuss this change with a wider audience. |
c0aa339 to
51500ca
Compare
|
|
51500ca to
99c29cc
Compare
|
My previous statement is incorrect. The binary format is backwards compatible with the latest minor release of the previous major release line and that is different than the backwards compatibility of the xcontent format. I've updated the PR to take this into account, but instead of adding a fallback parsing mechanism to parse to _source or remain to store the xcontent next to the binary format, the pr now fails to execute the percolate query and index percolator queries if the created index version of the index the percolator queries reside in is before the latest minor release of the previous release line. I believe that this behavior is better than if percolating becomes slower after an upgrading without any clear warning. If percolator queries are no longer compatible or if upgrading to the next version means that percolator queries are not going to be compatible then percolator indices should re-indexed using the reindex api. This shouldn't be a problem for indices only containing percolator queries, because the number of documents tends to be much lower compared to indices with actual data. |
|
Don't know too much about the internals but from a user perspective a performance improvement in percolators would be awesome! Just speaking from my use case: percolators are a projection from another source of truth so backwards compatibility is a non-issue for me. We would just remove the percolators index and re-create it. Thanks for looking into this! |
4d3823e to
ff31123
Compare
|
I've updated the pr. Removed the checks that prohibits using the To ensure this updated backwards compatability of query builders' binary representation; a qa test has been added that writes percolator queries with an older version and then reads them with the current version. Currently the qa test only tests against the current version, because there isn't a release that included the changes in this pr. I'm going to test more query builders (currently 2 are tested), but just wanted to check what others think about the testing approach. |
31ac8ac to
c32d291
Compare
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments but it looks good overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessary anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could try to reuse the inputs across calls since this class is not supposed to be used by multiple threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not seeing it, but the input classes here don't have some kind of a reset method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking: the percolator field type only supports a single value per doc? If yes I think it'd be nice to have a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm actually why don't you pull binary doc values like we used to do before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this change is a side effect of making the the query builder field inside the percolator field accessible via doc values fields. I needed that in order to make the query builder bwc qa test to work.
When field values via doc values fields it uses ScriptDocValues to access doc values. (DocValueFieldsFetchSubPhase.java line 88) This uses a special encoding to deal with multi values binary values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I changed this in the latest commit to now not use our field data apis, but BinaryDocValues directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if we don't need the values, can you leave comments about what these ints store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a comment. The reason two vints are read is because of the format for binary fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have an assert after this line that we consumed all bytes
qa/query-builder-bwc/build.gradle
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll only run this qa module with versions >= 6.0 I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why CURRENT? it should be the index created version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also make sure that we read all the bytes?
ee61b71 to
3ecd820
Compare
3ecd820 to
5748912
Compare
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Maybe we should discuss whether we should get this change in 6.0 in order to be able to remove the bw compat logic when we move to 8.0. cc @s1monw @clintongormley
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check should not be necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now I think about it: no :)
5748912 to
4dcf938
Compare
f94fac7 to
dd1c084
Compare
…of its XContent representation. The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput. The query builder's binary format has now the same bwc guarentees as the xcontent format. Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
dd1c084 to
7c3735b
Compare
|
@jpountz I labelled this issue backport pending in order to discuss whether this should be backported to 6.0. |
|
We are going to backport this change after 6.0.0-beta1 has been released. |
|
This has been backported to the 6.0 branch: 77c73c2 |
… just the current version. Relates to #25456
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput. The performance gain of this change depends on the complexity of the stored percolator queries. A percolator query with only simple term query will not benefit as much as bool query with many clauses. A simple test with percolator queries having bool query with two should clauses showed a 35% improvement in query time. Percolator queries do need to be re-indexed in order to benefit from this change.
The change looks bigger than it is. The main change is in
PercolatorFieldMapperandPercolateQueryBuilder. The changes in the other files are related to make sure thatNamedWriteableRegistryis available in thePercolateQueryBuilder.