-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
Elasticsearch version (bin/elasticsearch --version): 6.5.1 and earlier 6.x versions (tested in 6.4.2 as well)
Version: 6.5.1, Build: default/tar/8c58350/2018-11-16T02:22:42.182257Z, JVM: 1.8.0_181
Plugins installed: [] None
JVM version (java -version):
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
OS version (uname -a if on a Unix-like system):
Reproduced in Elastic SaaS and on Mac:
16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64
Description of the problem including expected versus actual behavior:
Index settings that use a shingle filter before a synonym filter in a filter chain AND contain multi-word synonyms that contain whitespace (like "eagle claw, eagleclaw") cause index creation and opening to fail.
For example, creating or opening an index with the following settings:
PUT testindex
{
"settings": {
"index": {
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms": [
"panamanian, panama",
"eagle claw, eagleclaw"
],
"tokenizer": "keyword"
},
"my_shingle": {
"max_shingle_size": "6",
"min_shingle_size": "2",
"output_unigrams": "true",
"type": "shingle"
}
},
"analyzer": {
"with_synonyms": {
"filter": [
"lowercase",
"my_shingle",
"synonym"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
}
...yields this error:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 2",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: eagle claw analyzed to a token (eagle claw) with position increment != 1 (got: 0)"
}
}
},
"status": 400
}
Workarounds:
Either remove all multi-word thesaurus entries from the index settings ("eagle claw, eagleclaw") which is undesirable, or, change the order in the filter chain so that the synonym filter comes before theshingle filter, like so:
(excerpt)
"analyzer": {
"with_synonyms": {
"filter": [
"lowercase",
"synonym",
"my_shingle"
],