-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add pre-configured “lowercase” normalizer #53882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-search (:Search/Mapping) |
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a really helpful addition. Overall, I think it would be good to add a test to make sure everything is 'wired up' as expected -- one option is to add a test case to KeywordFieldMapperTests that exercises the built-in lowercase normalizer.
It's possible that users already have a normalizer named lowercase configured in the settings. (Note that in the future we plan to ban users from defining analysis components with the same names as built in ones, but we currently allow this behavior: #22263). Some suggestions to help start the discussion on how this should be handled:
- We should make sure that we at least don't error out in this case, since it could be a common set-up. Ideally, I think we'd prefer the user-defined normalizer so that there aren't any surprising changes in behavior during an upgrade.
- We can add an entry to the migration documentation encouraging users to remove their custom-defined 'lowercase' normalizer in favor of using the built-in one, or to rename it.
server/src/main/java/org/elasticsearch/index/analysis/LowercaseNormalizer.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review, @jtibshirani !
I checked this with a pre-existing "lowercase" field that didn't actually lowercase (only ascii-folding). The behaviour was what I would have hoped for:
|
37a3d02 to
9f07520
Compare
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this with a pre-existing "lowercase" field that didn't actually lowercase (only ascii-folding). The behaviour was what I would have hoped for
That behavior makes sense to me too. A couple last comments:
- We could add a test to verify the behavior that we always prefer a user's custom analyzer definition. This would guard against accidental changes to the upgrade behavior that we want. Perhaps
AnalysisRegistryTestswould be a good place to add a check. - I think we should mention the change in the migration documentation. Otherwise users won't know that they can clean up the index settings and remove a custom
lowercaseanalyzer.
Finally, I wonder if it's worth checking with the team that we're happy with this approach. It would set a precedent for adding future built-in analyzers (or perhaps there's already a precedent I don't know about?)
|
Thanks for the comments, Julie. |
jtibshirani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment. Other than that it looks good to me, thanks for the all the iterations.
server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldMapperTests.java
Outdated
Show resolved
Hide resolved
de9f454 to
5387a51
Compare
server/src/test/java/org/elasticsearch/index/mapper/KeywordFieldMapperTests.java
Outdated
Show resolved
Hide resolved
A pre-configured normalizer for lower-casing. Closes #53872
Simplify the common scenario of wanting to lower-case values.
Closes #53872