Skip to content

Conversation

@romseygeek
Copy link
Contributor

The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary for every shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.

@romseygeek romseygeek requested a review from jpountz April 21, 2021 09:00
@romseygeek romseygeek self-assigned this Apr 21, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 21, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek romseygeek merged commit 993f0b0 into elastic:master Apr 21, 2021
@romseygeek romseygeek deleted the bug/ukrainian-analyzer branch April 21, 2021 11:13
romseygeek added a commit that referenced this pull request Apr 21, 2021
The lucene Ukrainian analyzer has a bug where a large in-memory
dictionary is loaded and stored on a thread local for every tokenstream
generated in a new thread (for more details see
https://issues.apache.org/jira/browse/LUCENE-9930). Due to checks
added in #50908, we create a tokenstream for every registered
analyzer in every shard, which means that any node with the ukrainian
plugin installed will leak one copy of this dictionary per shard,
whether or not the ukrainian analyzer is actually being used.

This commit makes the plugin use a fixed version of the
UkrainianMorfologikAnalyzer, until we merge a version of lucene that
contains the upstream fix.
@ppf2
Copy link
Contributor

ppf2 commented May 26, 2021

@romseygeek Is the version label correct in this PR? It's not listed in the release notes (https://www.elastic.co/guide/en/elasticsearch/reference/current/release-notes-7.13.0.html). If this didn't make it to 7.13.0, will it be in 7.13.1? Thx!

@romseygeek
Copy link
Contributor Author

Not sure why it's not in the release notes, but it's in the 7.13 release: d6038a3

ppf2 added a commit that referenced this pull request May 26, 2021
#71998 was fixed in 7.13.0 but it is missing from the release notes.
@ppf2
Copy link
Contributor

ppf2 commented May 26, 2021

Thx for confirming @romseygeek ! I have filed a doc PR to add it (#73440).

jrodewig pushed a commit that referenced this pull request May 26, 2021
#71998 was fixed in 7.13.0 but was missed in the release notes.
jrodewig added a commit that referenced this pull request May 26, 2021
#71998 was fixed in 7.13.0 but was missed in the release notes.

Co-authored-by: Pius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants