-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add support for fragment_length in the unified highlighter #23431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minors. As I said, I'm not a fan of using BreakIterator for something so narrow but this is a big improvement so it is worth getting in regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like BreakIterator for this. It is a much wider interface than the highlighter needs. But it is what we have to work with. So we shall!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be hard checks? Or have a message associated with them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd make these a hard check and throw a new IllegalArgumentException("Usage doesn't look like UnifiedHighlighter").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth commenting why this is needed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noMatchSize + pos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I reading this right, this is the only change to the file. Everything else is just word wrapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a non-random test just so it is easier to see? One with a "normal" sentence and maybe one with a long one? Just for illustration purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you have enough in CustomUnifiedHighlighterTests, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd throw an exception if it is > 0 rather than ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value is 100 so it feels weird to set an explicit fragment_length:0 to make this mode works ?
This commit introduce a new break iterator (a BoundedBreakIterator) designed for the unified highlighter that is able to limit the size of fragments produced by generic break iterator like `sentence`. The `unified` highlighter now supports `boundary_scanner` which can `words` or `sentence`. The `sentence` mode will use the bounded break iterator in order to limit the size of the sentence to `fragment_length`. When sentences bigger than `fragment_length` are produced, this mode will break the sentence at the next word boundary **after** `fragment_length` is reached.
|
Thanks @nik9000 |
* Add support for fragment_length in the unified highlighter This commit introduce a new break iterator (a BoundedBreakIterator) designed for the unified highlighter that is able to limit the size of fragments produced by generic break iterator like `sentence`. The `unified` highlighter now supports `boundary_scanner` which can `words` or `sentence`. The `sentence` mode will use the bounded break iterator in order to limit the size of the sentence to `fragment_length`. When sentences bigger than `fragment_length` are produced, this mode will break the sentence at the next word boundary **after** `fragment_length` is reached.
* master: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
* master: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
* single-node-discovery: Eclipse: move print margin to 100 columns Add support for fragment_length in the unified highlighter (elastic#23431)
This commit introduce a new break iterator (a BoundedBreakIterator) designed for the
unifiedhighlighter that is able to limit the size of fragments produced by generic break iterator likesentence.The
unifiedhighlighter now supportsboundary_scannerwith the following mode:wordsandsentence. Thesentencemode uses the new bounded break iterator in order to limit the size of the sentence tofragment_length.When sentences bigger than
fragment_lengthare produced, this mode will break the sentence at the next word boundary afterfragment_lengthis reached.