-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>docsGeneral docs changesGeneral docs changesTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchhelp wantedadoptmeadoptme
Description
Elasticsearch version: 5.5.1 (probably others)
JVM version: 1.8.0_111
OS version: MacOS
Description of the problem including expected versus actual behavior:
When using the _analyze API, if the request body only specifies a char_filter, it will simply use the default analyzer and not include the char_filter as well.
Steps to reproduce:
This works as expected, executing the char_filter before the tokenizer:
GET _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type" : "mapping",
"mappings" : [ "- => _"]
}
],
"text": "foo-bar"
}
response:
{
"tokens": [
{
"token": "foo_bar",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
}
]
}
However, omitting the tokenizer, the char_filter doesn't appear to execute:
GET _analyze
{
"char_filter": [
{
"type" : "mapping",
"mappings" : [ "- => _"]
}
],
"text": "foo-bar"
}
response:
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bar",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Further evidence: running with an unknown char_filter doesn't generate an error:
GET _analyze
{
"char_filter": [
{
"type" : "mmmmmmapping",
"mappings" : [ "- => _"]
}
],
"text": "foo-bar"
}
response:
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bar",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
whereas, normally it would:
GET _analyze
{
"tokenizer": "standard",
"char_filter": [
{
"type" : "mmmmmmapping",
"mappings" : [ "- => _"]
}
],
"text": "foo-bar"
}
response:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[s17n1jN][127.0.0.1:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "failed to find global char filter under [mmmmmmapping]"
},
"status": 400
}
Metadata
Metadata
Assignees
Labels
:Search Relevance/AnalysisHow text is split into tokensHow text is split into tokens>docsGeneral docs changesGeneral docs changesTeam:Search RelevanceMeta label for the Search Relevance team in ElasticsearchMeta label for the Search Relevance team in Elasticsearchhelp wantedadoptmeadoptme