Skip to content

Review TokenFilterFactory.getSynonymFilter() implementations #34298

@romseygeek

Description

@romseygeek

#33702 introduced a new method on the TokenFilterFactory interface allowing filters to return specialized versions of themselves for synonym parsing. Currently only the multiplexer implements this, to return the original token. We should review all the filter factories shipped with elasticsearch to see if any others need changing.

[] AsciiFoldingFilter -> should return only the folded token
[] CJKBigramFilter -> ignore if emitUnigrams = true
[] CommonGramsTokenFilter -> ignore
[] CompoundWordTokenFilterBase -> ??
[] EdgeNGramTokenFilter -> ??
[] Fingerprint & MinHash -> shouldn't be used with synonyms anyway...
[] Keyword repeat -> ignore
[] NGramTokenFilter -> ??
[] Shingle -> shouldn't output unigrams
[] SynonymGraph & Synonym -> should we allow multiple synonym chains?
[] Phonetic -> ignore
[] WordDelimiterGraph & WordDelimiter -> ??

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions