-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch
Description
To reproduce the error(using 0.90.0, but was also able to reproduce on 1.0 Beta1):
curl -XPOST http://localhost:9200/foo -d '{ "mappings": { "bar": { "dynamic": "false", "properties": { "id": { "type": "integer" }, "content": { "type": "string", "analyzer": "foobar" } } } }, "settings": { "index": { "analysis": { "char_filter": { "iso_mapping" : { "type" : "mapping", "mappings" : ["ü=>ue"] } }, "filter": { "wordDelimiter": { "type": "word_delimiter", "split_on_numerics": "false", "generate_word_parts": "true", "generate_number_parts": "true", "catenate_words": "true", "catenate_numbers": "true", "catenate_all": "false" } }, "analyzer": { "foobar": { "tokenizer": "whitespace", "filter": [ "lowercase", "wordDelimiter" ], "char_filter": "iso_mapping" } } } } } }'
curl -XPUT http://localhost:9200/foo/bar/1 -d '{ "id": 1, "content": "eins, fünf, sechs" }'
curl -XPUT http://localhost:9200/foo/bar/2 -d '{ "id": 2, "content": "eins, fünf,sechs" }'
curl -XPUT http://localhost:9200/foo/bar/3 -d '{ "id": 3, "content": "eins, vier, sechs" }'
curl -XPUT http://localhost:9200/foo/bar/4 -d '{ "id": 4, "content": "eins, vier,sechs" }'
The, for the broken case(where the char filter is used):
curl -XPOST http://localhost:9200/foo/bar/_search -d' { "from": 0, "size": 300, "query": { "match": { "content": "Fünf" } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'
where we get:
"highlight":{"content":["eins, <em>fünf,</em> sechs"]}}
"highlight":{"content":["eins, <em>fünf,sechs</em>"]}}
And for a working case(no char filter used):
curl -XPOST http://localhost:9200/foo/bar/_search -d' { "from": 0, "size": 300, "query": { "match": { "content": "vier" } }, "highlight": { "fields": { "content": { "fragment_size": 50, "number_of_fragments": 5 } } } }'
where we get:
"highlight":{"content":["eins, vier,sechs"]}}
"highlight":{"content":["eins, vier, sechs"]}}
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>bugTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearch