Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Nov 18, 2016

The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.

Fixes #21636

The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.

Fixes #21636
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment. I'm also curious whether the plain highlighter is the only affected highlighter?

text = ((BytesRef) textToHighlight).utf8ToString();
} else {
text = textToHighlight.toString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it work if we called MappedFieldType.valueForDisplay instead? I'm concerned some fields are stored as bytesrefs too but do not represent utf8 strings, like ip addresses

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's better to call valueForDisplay (I'll change the PR) though we are protected here since the highlighting can be done only on text or keyword fields.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a suggestion but it looks good to me. Feel free to merge regardless of whether you apply this suggestion.

text = mapper.fieldType().valueForDisplay(textToHighlight).toString();
} else {
text = textToHighlight.toString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just do String text = mapper.fieldType().valueForDisplay(textToHighlight).toString(); all the time? This looks correct to me and is more future-proof? If that does not work, feel free to merge the PR as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of textToHighlight can be different whether it comes from the _source or from the stored fields. Calling valueForDisplay on it when it comes from _source would not work.

@jimczi jimczi merged commit 9024744 into elastic:master Nov 21, 2016
@jimczi
Copy link
Contributor Author

jimczi commented Nov 21, 2016

The plain highlighter is the only one affected by this bug because the others are not eligible on a keyword field. The offset and term_vector options are not exposed on a keyword field.

@jimczi jimczi deleted the highlight_stored_keyword branch November 21, 2016 09:31
jimczi added a commit that referenced this pull request Nov 21, 2016
* Fix highlighting on a stored keyword field

The highlighter converts stored keyword fields using toString().
Since the keyword fields are stored as utf8 bytes the conversion is broken.
This change uses BytesRef.utf8toString() to convert the field value in a valid string.

Fixes #21636

* Replace BytesRef#utf8ToString with MappedFieldType#valueForDisplay
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Nov 22, 2016
* master: (42 commits)
  Add support for merging custom meta data in tribe node (elastic#21552)
  [DOCS] Show EC2's auto attribute (elastic#21474)
  Add information about the removal of store throttling to the migration guide.
  Add a recommendation against large documents to the docs. (elastic#21652)
  Add indices options tests to search api REST tests (elastic#21701)
  Fixing indentation in geospatial querying example. (elastic#21682)
  Fix typo in filters aggregation docs (elastic#21690)
  Add BWC layer for Exceptions (elastic#21694)
  Add checkstyle rule to forbid empty javadoc comments (elastic#20881)
  Docs: Added offline install link for discovery-file plugin
  remove pointless catch exception in TransportSearchAction (elastic#21689)
  Rename ClusterState#lookupPrototypeSafe to `lookupPrototype` and remove previous "unsafe" unused variant (elastic#21686)
  Use a buffer to do character to byte conversion in StreamOutput#writeString (elastic#21680)
  Fix integer overflows when dealing with templates. (elastic#21628)
  Fix highlighting on a stored keyword field (elastic#21645)
  Set execute permissions for native plugin programs (elastic#21657)
  adjust visibility of DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNodes.Delta constructor
  Remove unused DiscoveryNode#removeDeadMembers public method
  Remove minNodeVersion and corresponding public `getSmallestVersion` getter method from DiscoveryNodes
  ...
@jimczi jimczi restored the highlight_stored_keyword branch November 22, 2016 12:42
@jimczi jimczi deleted the highlight_stored_keyword branch February 3, 2017 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants