-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add LimitedOffsetsEnum to Limited offset token #86110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LimitedOffsetsEnum to Limited offset token #86110
Conversation
2. Add highlight tests for maxAnalyzedOffset in different OffsetSource
|
Pinging @elastic/es-search (Team:Search) |
|
is this issue available to work ?. or is there anything pending on this issue? |
|
|
||
|
|
||
| protected Analyzer wrapAnalyzer(Analyzer analyzer, Integer maxAnalyzedOffset) { | ||
| if (maxAnalyzedOffset != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to wrap the analyzer here? Shouldn't it be handled by passing the max offset to CustomUnifiedHighlighter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At beginning i want to describe how the analyzer runs with different analyzers(Default VS LimitTokenOffsetAnalyzer) works in different maxAnalyzedOffset.
i think in this test case is unnecessary, but we can add random tests for different maxAnalyzedOffset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++, random tests would be great
|
@elasticmachine test this please |
@romseygeek Thanks for the reviewing this. At 5d5e701 i removed the |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @luyuncheng! I left a few small comments but I think this is close.
| private final OffsetsEnum delegate; | ||
| private final Integer maxOffset; | ||
|
|
||
| public LimitedOffsetsEnum(OffsetsEnum delegate, @Nullable Integer maxOffset) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have this as a raw int rather than an object, and just don't wrap with this Enum if the max offset is not defined. Then we can remove the null check on every nextPosition call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact you already have a null check at the call site so we can happily make this an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In e18513f fixed it
| Locale.ROOT, | ||
| BreakIterator.getSentenceInstance(Locale.ROOT), | ||
| 0, | ||
| // OLD Strategy would Response: "Testing <b>Fun</b> Testing <b>Fun</b>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment will be confusing in a couple of years time (what was the old strategy? why was it different?). I think the current behaviour is a bug so it's fine to just test for it, no need to add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @Override | ||
| protected Passage[] highlightOffsetsEnums(OffsetsEnum off) throws IOException { | ||
|
|
||
| OffsetsEnum wrapOff = off; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to have a new variable here, you can just reassign off
|
@elasticmachine test this please |
|
@elasticmachine generate changelog |
docs/changelog/86110.yaml
Outdated
| @@ -0,0 +1,6 @@ | |||
| pr: 86110 | |||
| summary: Add LimitedOffsetsEnum to Limited offset token | |||
| area: Search/Highlighting | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change this to just Search? Apparently Search/Highlighting doesn't work here, 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Fixed chagelog area
|
@elasticmachine ok to test |
|
Hi @luyuncheng can you run |
Merge Master into it
…TOKENS_PRE Merge Master and precommit into it
|
@elasticmachine update branch |
|
Thanks for your patience @luyuncheng! |
In #67325 add
max_analyzed_offsetallows users to limit the highlighting of text fields.but when using
"index_options": "offsets"in mappings, thismax_analyzed_offset offsetcan not be limited,because it would use
OffsetSource.ANALYSISandPostingsOffsetStrategyto highlight field.When a string="Testing Fun Testing Fun" need to be highlight
Using
UnifiedHighlighter.OffsetSource.ANALYSISwithqueryMaxAnalyzedOffset=10would response:because in #67325 add
LimitTokenOffsetAnalyzerBUT USING
UnifiedHighlighter.OffsetSource.POSTINGSwhich mapping isindex_options=offsetsandqueryMaxAnalyzedOffset=10would response:because offset can not be limited by
LimitTokenOffsetAnalyzermay be can add a new
LimitedOffsetsEnumto limit offset tokenLink Issue: #86109