-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Analysis] Support normalizer in request param #24767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Analysis] Support normalizer in request param #24767
Conversation
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johtani I like this PR and had fun reviewing it and learning more about this analysis feature. I left some comments but I have to appologize in advance that I'm not an expert in this area yet, however I hope the comments might be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can start having a unit test for the AnalyzeRequest in which e.g. the validate method and the serialization can be checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for this parsing part to RestAnalyzeActionTests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can maybe go inside the following else branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question out of curiosity: the analyzer we get here doesn't have to be closed (via closeAnalyzer) because its not a new instance? I don't know enough about the lifecycle of these objects yet I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it already exists instance that created by IndexService or something. Only close if TransportAnalyzeAction create CustomAnalyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to throw an error here? As far as I see specifying a normalizer and analyzer or tokenizer doesn't make sense? This combination can already be detected earlier on the request I think (is validate()) always called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add check logic in request.validate() method.
Unfortunately, it is not always called. If you call shardOperation yourself directly, validate() method is not called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think its better than nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be split into the two cases request.normalizer() != null and (request.tokenFilters() != null && request.tokenFilters().size() > 0) || (request.charFilters() != null && request.charFilters().size() > 0) in two separate else if blocks instead of separating these cases later? I'm not entirely sure if this works, but I think it would make this part easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test for the second code path added in this PR (the case where normalizer == null but filter or char_filter is not null and tokenizer/analyzer is null)? I don't know if it is possible with this test setup but it might be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I added that test case in rest api test. Now, we are moving to filter/char_filter to analysis-common module, so I think it would be better than in this test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace twitter with the new index name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "is requiring a"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "can analyze",
nit: "... or if char_filter/filter is set and tokenizer/analyzer is not set"
|
@elasticmachine test this please |
a2dbf1d to
39c3eec
Compare
|
@cbuescher Passed CI, please review again after the conference :) |
cbuescher
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johtani, LGTM.
I left a few minor comments, feel free to adapt or simply ignore them. The question I left is only for my own understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ thanks for adding these checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Nromalizer/Normalizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a question: I see how we use this in other bwc tests as well, I guess it represents the request. How did you get that String, do we have tools for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... I made the string using Base64.getEncoder() and sysout...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment saying what request it represents and which version it has been generated with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe use VandomUtils#randomVersionBetween()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good to know. I don't know it :)
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call getMultiTermComponent on factories, but otherwise it looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you are missing the call to MultiTermAwareComponent.getMultiTermComponent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment saying what request it represents and which version it has been generated with?
Support normalizer param Support custom normalizer with char_filter/filter param Closes elastic#23347
Add AnalyzeRequestTest Fix some comments
Fix some comments Remove non-use imports elastic#23347
6b06274 to
8d72356
Compare
Fix some comments
8d72356 to
c6dd360
Compare
|
@jpountz Rebased master and moved check and call logic into parseTokenFilterFactories |
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* master: [Analysis] Support normalizer in request param (elastic#24767) Remove deprecated IdsQueryBuilder constructor (elastic#25529) Adds check for negative search request size (elastic#25397) test: also inspect the upgrade api response to check whether the upgrade really ran [DOCS] restructure java clients docs pages (elastic#25517)
Support normalizer param and custom normalizer with char_filter/filter param.
In this PR, I didn't change a response.
If user send a request with keyword field name or normalizer name, analyze api display a response with tokenizer that is KeywordTokenizer.
Should we change a response format for normalizer?
Closes #23347