-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Query refactoring: TermQueryBuilder refactoring and test #10669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query refactoring: TermQueryBuilder refactoring and test #10669
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why package private? Other members are private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed that in new version of this PR
cfc2069 to
8f4218e
Compare
|
Rebased this PR on top of current tip of feature branch and changed test to make use of new BaseQueryTestCase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about these checks... we check the same right after parsing (at some point will be moved to coord node), is that correct? and we check again here (which will stay on the data node). I wonder if we should really repeat these checks, or share them between the two methods and introduce some validation method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added them here because in theory one could create query objects without parsing them. Maybe a void validate() on each query builder that also the parsers can use to check before they return a new builder from the fromXContent() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a validation method for TermQueryBuilder and reused that in the parser. Only downside of this is where previously we had a QueryParseException we now have a ElasticsearchNullPointerException, or we need to try/catch and rethrow which IMHO defeats the purpose. Have a look and let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First thing which I keep forgetting: you are right, queries can be created without going through fromXContent, java api will do just that. There are different places where we need the same validation, I think having a validate method is a good choice.
That said, one question is where do we call it from? fromXContent after parsing (on coordinating node in the future) makes perfect sense, but that is called only when the query is provided as json. Java api allows to create the intermediate query representation directly, meaning that the fromXContent step will be skipped there. What can happen in this case is 1) that the query gets serialized to other nodes without being validated, and validated on the data nodes multiple times leading to multiple errors, which is not quite what we want (we want to catch errors once and earlier), 2) that the coordinating node is the data node where the query gets executed, query might not even get serialized, we execute toQuery straight-away.
I believe the proper way to validate the query is hooking into the existing request validation mechanism (ActionRequest#validate). Every request gets already validated on the coordinating node. The SearchRequest should simply call query#validate then I think (only problem being that we cannot quite do it yet, we need to refactor the search request first as right now the whole search request is just a big json object)?
As for exceptions, if we adopt this approach I think we should be consistent with the existing validation infra and have ActionRequestValidationExceptions.
Does this make sense to you guys?
If so, we need to decide what to do for now, I think having the validate method on the base class makes sense as Lee mentioned. For now we could call it from a single place though (toQuery is safer as it covers all the cases), once we move to parsing to the coord node we need to move it to the validation api instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this, I think this is a great ide, for now we can do the validate method and figure out where to hook it in at a later time.
Also as an aside, I'd love to get rid of Preconditions.checkNotNull and just use Objects.requireNonNull since it's part of the JVM, but that's just personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I add this to the QueryBuilder interface, I'll have add an empty impl to the BaseQueryBuilder for now (can be removed later once all queries have validate() method). For the details I would like to open a separate issue in which we can then have discussion about exact signature, which exceptions to be thrown, Preconditions vs. Objects.requireNonNull etc... Let me know if you agree with this plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #10777 for separate discussion on how to handle validation.
|
left a few comments |
|
I pulled out the validation of the two important fields into a method shared by builder and parser part, but not sure if this is the best way to go. Happy about comments there. Hope I adressed the rest of your comments, I would prefer tracking the use of QueryParseContext in toQuery() in a separate issue before making huge changes there. Same goes for generifying named queries (lookup but also maybe simplification) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change name to fieldName in the constructor too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
Changed variable name in constructor and added validate() to the QueryBuilder interface. As long as this is not implemented by all queries, added empty impl to BaseQueryBuilder. Also opened two separate issues to keep track of further ideas for validation (#10777) and for generifying named queries (#10776). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the @Override annotation here?
99690c7 to
80d3571
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question I had myself here: I had to change this for the fromXContentent() test to work. Otherwise, if the value is e.g. an int, it is written to the XContent query as a number value. With the original parser.text() it is then read in as String (which doesn't seem to matter for the later toQuery(), since there BytesRefs.toBytesRef(value) is used, which calls toString() anyway) but blows up the newly introduced equality test. I changes this to objectText() because this does some checking based in the JsonToken type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it, I tend to want to revert this change since I'm not sure about it's implications. The problem then is that Object value can be of different type in original query and after fromXContent(), but in the end it only matters that toQuery() produces the same result, and there every Object is converted to lucene BytesRef. Thats why I tend to use that conversion also in the TermQueryBuilder#equals(). Will push that change to show what I mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed this and we went for always keeping BytesRef within TermQueryBuilder rather than mixing up String and BytesRef depending on whether we received the query via java api or we parsed it through json. Java api will still have a setter that accepts a string, but internally that setter will convert from String to BytesRef.
We do have to fix the above parser.text() which looks like a bug that never manifested as when parsing and executing on the same (data) node everything still works. But we are currenly parsing the value in the term query short format (term: { field: value}) always as a string.
bcfb8e2 to
f8625b4
Compare
|
@javanna went through the comments and current diff again, here are the things that are open from my point of view:
Anything else open here that I'm missing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to check if these are empty too and barf in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more thing: s/Fieldname/ field name
I think in general the first line of an error should be lowercase, that's pretty much the convention we use in the existing validation code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you rebase you'll get this change that was made upstream and merged back into our branch
Also extended BaseQueryTestCase so it has helper methods for parsing the query header and extended the toQuery() test method so it passes down parse context to sublass to make assertions on side effects calling toQuery() has on the parseContext.
8d18c4f to
21afc61
Compare
|
Rebased this PR on current feature branch and changed validation exception class to own version similar to the previously suggested ActionRequestValidationException. Also opened separate issue #10974 for keeping track of adding tests for invalid json and set the TermQueryBuilder test to 20 repetitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to update exception in javadocs
|
LGTM besides the minor comments left, if you can address them that would be great, this is good to merge then! |
5e1d936 to
11e841c
Compare
|
Thanks for the quick response, pushed this to the feature branch. |
|
Thanks for the review and for closing the issue, somehow it got opened On Wed, May 6, 2015 at 7:34 AM, Luca Cavanna [email protected]
Christoph Büscher |
Split the parse(QueryParseContext ctx) method into a parsing and a query building part, adding Streamable for serialization and hashCode(), equals() for better testing.
Add basic unit test for Builder and Parser.
PR goes agains query-refacoring feature branch.