Skip to content

Synonym building fails if filter proceeded by compound word filter #40000

@BastianHofmann

Description

@BastianHofmann

I'm using the official Docker image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2

Elasticsearch version (bin/elasticsearch --version): 6.3.2

Plugins installed: []

JVM version (java -version): 10.0.2

OS version (uname -a if on a Unix-like system): Linux 596b10634157 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

When creating an index with a custom analyzer with a dictionary_decompounder filter and synonym filter (in this order), the index creation fails with an invalid argument exception.

Steps to reproduce:

  1. create index
PUT http://localhost:9200/test

{
    "analysis": {
        "analyzer": {
            "test": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "myCompounds",
                    "mySynonyms"
                ]
            }
        },
        "filter": {
            "myCompounds": {
                "type": "dictionary_decompounder",
                "word_list": [
                    "Kaufmann"
                ]
            },
            "mySynonyms": {
                "type": "synonym",
                "synonyms": [
                    "Verkäufer, Kaufmann im Einzelhandel"
                ]
            }
        }
    }
}
  1. See error response
{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "failed to build synonyms"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to build synonyms",
        "caused_by": {
            "type": "parse_exception",
            "reason": "Invalid synonym rule at line 1",
            "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "term: Kaufmann im Einzelhandel analyzed to a token (Kaufmann) with position increment != 1 (got: 0)"
            }
        }
    },
    "status": 400
}

It works if you switch the order of the filters. I know that the synonym list is analyzed with all the filters, that come before the synonym filter. However why would that cause an error as soon as a compound word is included in the synonyms. Shouldn't we just get a list of "Verkäufer, Kaufmann im Einzelhandel, Kaufmann" for a value of "Verkäufer"?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions