Skip to content

Conversation

@luyuncheng
Copy link
Contributor

@luyuncheng luyuncheng commented Apr 24, 2022

In #67325 add max_analyzed_offset allows users to limit the highlighting of text fields.
but when using "index_options": "offsets" in mappings, this max_analyzed_offset offset can not be limited,
because it would use OffsetSource.ANALYSISand PostingsOffsetStrategy to highlight field.

When a string="Testing Fun Testing Fun" need to be highlight

Using UnifiedHighlighter.OffsetSource.ANALYSIS with queryMaxAnalyzedOffset=10 would response:

"Testing <b>Fun</b> Testing Fun"

because in #67325 add LimitTokenOffsetAnalyzer

BUT USING UnifiedHighlighter.OffsetSource.POSTINGS which mapping is index_options=offsets and queryMaxAnalyzedOffset=10 would response:

"Testing <b>Fun</b> Testing <b>Fun</b>"

because offset can not be limited by LimitTokenOffsetAnalyzer

may be can add a new LimitedOffsetsEnum to limit offset token

Link Issue: #86109

2. Add highlight tests for maxAnalyzedOffset in different OffsetSource
@elasticsearchmachine elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v8.3.0 labels Apr 24, 2022
@jtibshirani jtibshirani added the :Search Relevance/Highlighting How a query matched a document label May 18, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@javanna javanna removed the v8.4.0 label Jun 1, 2022
@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:07
@rahuldimri
Copy link

is this issue available to work ?. or is there anything pending on this issue?



protected Analyzer wrapAnalyzer(Analyzer analyzer, Integer maxAnalyzedOffset) {
if (maxAnalyzedOffset != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to wrap the analyzer here? Shouldn't it be handled by passing the max offset to CustomUnifiedHighlighter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At beginning i want to describe how the analyzer runs with different analyzers(Default VS LimitTokenOffsetAnalyzer) works in different maxAnalyzedOffset.
i think in this test case is unnecessary, but we can add random tests for different maxAnalyzedOffset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, random tests would be great

@romseygeek
Copy link
Contributor

@elasticmachine test this please

@luyuncheng
Copy link
Contributor Author

++, random tests would be great

@romseygeek Thanks for the reviewing this.

At 5d5e701 i removed the wrapAnalyzer and merge the assertHighlightOneDoc into one function
AND
i added a test for random different OffsetSource and maxAnalyzedOffset

@luyuncheng luyuncheng requested a review from romseygeek October 18, 2022 04:35
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @luyuncheng! I left a few small comments but I think this is close.

private final OffsetsEnum delegate;
private final Integer maxOffset;

public LimitedOffsetsEnum(OffsetsEnum delegate, @Nullable Integer maxOffset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have this as a raw int rather than an object, and just don't wrap with this Enum if the max offset is not defined. Then we can remove the null check on every nextPosition call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact you already have a null check at the call site so we can happily make this an int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e18513f fixed it

Locale.ROOT,
BreakIterator.getSentenceInstance(Locale.ROOT),
0,
// OLD Strategy would Response: "Testing <b>Fun</b> Testing <b>Fun</b>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will be confusing in a couple of years time (what was the old strategy? why was it different?). I think the current behaviour is a bug so it's fine to just test for it, no need to add a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Override
protected Passage[] highlightOffsetsEnums(OffsetsEnum off) throws IOException {

OffsetsEnum wrapOff = off;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to have a new variable here, you can just reassign off

@luyuncheng luyuncheng requested a review from romseygeek October 18, 2022 10:35
@romseygeek
Copy link
Contributor

@elasticmachine test this please

@romseygeek
Copy link
Contributor

@elasticmachine generate changelog

@@ -0,0 +1,6 @@
pr: 86110
summary: Add LimitedOffsetsEnum to Limited offset token
area: Search/Highlighting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to just Search? Apparently Search/Highlighting doesn't work here, 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Fixed chagelog area
@luyuncheng luyuncheng requested a review from romseygeek October 18, 2022 10:50
@romseygeek
Copy link
Contributor

@elasticmachine ok to test

@romseygeek
Copy link
Contributor

Hi @luyuncheng can you run ./gradlew precommit at the root and then push the relevant changes?

@romseygeek
Copy link
Contributor

@elasticmachine update branch

@romseygeek romseygeek merged commit f641000 into elastic:main Oct 18, 2022
@romseygeek
Copy link
Contributor

Thanks for your patience @luyuncheng!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team :Search Relevance/Highlighting How a query matched a document Team:Search Meta label for search team v8.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants