-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
#33702 introduced a new method on the TokenFilterFactory interface allowing filters to return specialized versions of themselves for synonym parsing. Currently only the multiplexer implements this, to return the original token. We should review all the filter factories shipped with elasticsearch to see if any others need changing.
[] AsciiFoldingFilter -> should return only the folded token
[] CJKBigramFilter -> ignore if emitUnigrams = true
[] CommonGramsTokenFilter -> ignore
[] CompoundWordTokenFilterBase -> ??
[] EdgeNGramTokenFilter -> ??
[] Fingerprint & MinHash -> shouldn't be used with synonyms anyway...
[] Keyword repeat -> ignore
[] NGramTokenFilter -> ??
[] Shingle -> shouldn't output unigrams
[] SynonymGraph & Synonym -> should we allow multiple synonym chains?
[] Phonetic -> ignore
[] WordDelimiterGraph & WordDelimiter -> ??