-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch
Description
Elasticsearch version: Version: 6.1.1, Build: bd92e7f/2017-12-17T20:23:25.338Z, JVM: 1.8.0_144
Plugins installed: [analysis-icu, analysis-phonetic]
JVM version: java version "1.8.0_144"
OS version: Darwin Kernel Version 17.3.0
Description of the problem including expected versus actual behavior:
Daitch-Mokotoff analyzer returns only one token when it should return multiple.
Steps to reproduce:
...
"analyzer_daitch_mokotoff": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [
"daitch_mokotoff"
]
}
curl -XGET 'http://localhost:9200/indexname/_analyze?pretty' -H 'Content-Type: application/json' -d'{
"analyzer": "analyzer_daitch_mokotoff",
"text": "CHAUPTMAN"
}'
This should return 573660 (ch sounding like tch) and 473660 (ch sounding like kh) but instead only returns 473660.
{
"tokens" : [
{
"token" : "473660",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 0
}
]
}
See Daitch-Mokotoff soundex spec here: http://www.avotaynu.com/soundex.htm
Until this is fixed, the D-M soundex feature in the phonetic plugin is not usable.
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch