-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix highlighting on a stored keyword field #21645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The highlighter converts stored keyword fields using toString(). Since the keyword fields are stored as utf8 bytes the conversion is broken. This change uses BytesRef.utf8toString() to convert the field value in a valid string. Fixes #21636
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment. I'm also curious whether the plain highlighter is the only affected highlighter?
| text = ((BytesRef) textToHighlight).utf8ToString(); | ||
| } else { | ||
| text = textToHighlight.toString(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it work if we called MappedFieldType.valueForDisplay instead? I'm concerned some fields are stored as bytesrefs too but do not represent utf8 strings, like ip addresses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's better to call valueForDisplay (I'll change the PR) though we are protected here since the highlighting can be done only on text or keyword fields.
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a suggestion but it looks good to me. Feel free to merge regardless of whether you apply this suggestion.
| text = mapper.fieldType().valueForDisplay(textToHighlight).toString(); | ||
| } else { | ||
| text = textToHighlight.toString(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just do String text = mapper.fieldType().valueForDisplay(textToHighlight).toString(); all the time? This looks correct to me and is more future-proof? If that does not work, feel free to merge the PR as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of textToHighlight can be different whether it comes from the _source or from the stored fields. Calling valueForDisplay on it when it comes from _source would not work.
|
The plain highlighter is the only one affected by this bug because the others are not eligible on a keyword field. The offset and term_vector options are not exposed on a keyword field. |
* Fix highlighting on a stored keyword field The highlighter converts stored keyword fields using toString(). Since the keyword fields are stored as utf8 bytes the conversion is broken. This change uses BytesRef.utf8toString() to convert the field value in a valid string. Fixes #21636 * Replace BytesRef#utf8ToString with MappedFieldType#valueForDisplay
* master: (42 commits) Add support for merging custom meta data in tribe node (elastic#21552) [DOCS] Show EC2's auto attribute (elastic#21474) Add information about the removal of store throttling to the migration guide. Add a recommendation against large documents to the docs. (elastic#21652) Add indices options tests to search api REST tests (elastic#21701) Fixing indentation in geospatial querying example. (elastic#21682) Fix typo in filters aggregation docs (elastic#21690) Add BWC layer for Exceptions (elastic#21694) Add checkstyle rule to forbid empty javadoc comments (elastic#20881) Docs: Added offline install link for discovery-file plugin remove pointless catch exception in TransportSearchAction (elastic#21689) Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686) Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680) Fix integer overflows when dealing with templates. (elastic#21628) Fix highlighting on a stored keyword field (elastic#21645) Set execute permissions for native plugin programs (elastic#21657) adjust visibility of DiscoveryNodes.Delta constructor Remove unused DiscoveryNodes.Delta constructor Remove unused DiscoveryNode#removeDeadMembers public method Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes ...
The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.
Fixes #21636