-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use the Weight#matches mode for highlighting by default #96068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR adapts the unified highlighter to use the Weight#matches API by default when possible. This is the default mode in Lucene for some time now. For cases where the matches API won't work (nested and parent-child queries), the matches mode is disabled automatically. I didn't expose an option to explicitly disable this mode because that should be seen as an internal implementation detail. With this change, matches that span multiple terms are highlighted together (something that users asked for years) and the clauses that don't match the document are ignored.
|
Pinging @elastic/es-search (Team:Search) |
romseygeek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Do you think we need to worry about backwards compatibility, or can we just count this is an improvement?
...arent-join/src/internalClusterTest/java/org/elasticsearch/join/query/ChildQuerySearchIT.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/lucene/search/uhighlight/CustomUnifiedHighlighter.java
Outdated
Show resolved
Hide resolved
That would be my preference yep, happy to discuss (and adapt) if we think it should be considered differently. |
|
@romseygeek I disabled the Matches mode if a runtime field is queried or if |
|
@romseygeek, as discussed I added an undocumented index setting to disable the weight matches. |
Fix bwc tests broken by elastic#96068
Fix bwc tests broken by elastic#96068
|
@jimczi I know the likelihood is low but is there any chance we can get this ER backported to 8.4? I have a customer who runs a multi-tenant cluster and cannot perform the upgrade to 8.10 in the near term and they have an escalated customer situation on their end. I've set expectations that the chance of a backport is low but since you mentioned it as a possibility in an earlier comment just wanted to follow up. |
Unfortunately not since 8.4.x is not released anymore. |
|
@jimczi thanks for the quick response! |
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
…asticsearch version In elasticsearch 8.10.2 and higher versions highligh behaviour is different more info here: elastic/elasticsearch#96068 https://liferay.atlassian.net/browse/LPD-2141
|
Hello @jimczi, I've identified an error that I believe is related to this improvement. The error started occurring from version 8.10.0, with version 8.9.2 being the last one that works correctly. The issue happens when a search is performed using the I'll explain it with an example. (These tests are performed in version 8.17.0. But the same issue also occurs in version 8.10.0) Create index:
Create docuemnt 1: Create document 2: We perform a simple search: Response OK:
We perform a search that matches the end and beginning of the phrases in doc 2 Request: Response ERROR: The error in the log: The error does not occur if we set Reponse: On the other hand, if we run the same query by with Response: If we run the same test in version 8.9.2, the errors do not occur and the expected result is obtained by highlighting the end and beginning of the two sentences of the array: Response OK Could you please tell me if this is a bug? On the other hand, the highlighted result is different when the query includes a nested type or not. This implies that when queries are made that should return the same highlighted field, they are not doing so. Thanks! |

This PR adapts the unified highlighter to use the Weight#matches mode by default when possible. This is the default mode in Lucene for some time now. For cases where the matches mode won't work (nested and parent-child queries),
the matches mode is disabled automatically.
I didn't expose an option to explicitly disable this mode because that should be seen as an internal implementation detail. With this change, matches that span multiple terms are highlighted together (something that users asked for years) and the clauses that don't match the document are ignored.
Note that this new mode is enabled only when
require_field_matchis true and the query doesn't contain:Closes #29561