Skip to content

Broken highlighting when using char_filter + word_delimiter filter #3511

@lmenezes

Description

@lmenezes

To reproduce the error(using 0.90.0, but was also able to reproduce on 1.0 Beta1):

curl -XPOST http://localhost:9200/foo -d '{ "mappings": { "bar": { "dynamic": "false", "properties": { "id": { "type": "integer" }, "content": { "type": "string", "analyzer": "foobar" } } } }, "settings": { "index": { "analysis": { "char_filter": { "iso_mapping" : { "type" : "mapping", "mappings" : ["ü=>ue"] } }, "filter": { "wordDelimiter": { "type": "word_delimiter", "split_on_numerics": "false", "generate_word_parts": "true", "generate_number_parts": "true", "catenate_words": "true", "catenate_numbers": "true", "catenate_all": "false" } }, "analyzer": { "foobar": { "tokenizer": "whitespace", "filter": [ "lowercase", "wordDelimiter" ], "char_filter": "iso_mapping" } } } } } }'

curl -XPUT http://localhost:9200/foo/bar/1 -d '{ "id": 1, "content": "eins, fünf, sechs" }'
curl -XPUT http://localhost:9200/foo/bar/2 -d '{ "id": 2, "content": "eins, fünf,sechs" }'
curl -XPUT http://localhost:9200/foo/bar/3 -d '{ "id": 3, "content": "eins, vier, sechs" }'
curl -XPUT http://localhost:9200/foo/bar/4 -d '{ "id": 4, "content": "eins, vier,sechs" }'

The, for the broken case(where the char filter is used):

curl -XPOST http://localhost:9200/foo/bar/_search -d' { "from": 0, "size": 300, "query": { "match": { "content": "Fünf" } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'

where we get:

"highlight":{"content":["eins, <em>fünf,</em> sechs"]}}
"highlight":{"content":["eins, <em>fünf,sechs</em>"]}}

And for a working case(no char filter used):

curl -XPOST http://localhost:9200/foo/bar/_search -d' { "from": 0, "size": 300, "query": { "match": { "content": "vier" } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'

where we get:

"highlight":{"content":["eins, vier,sechs"]}}
"highlight":{"content":["eins, vier, sechs"]}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions