Skip to content

_analyze API skips char_filter when no tokenizer/filters specified #26495

@kurtado

Description

@kurtado

Elasticsearch version: 5.5.1 (probably others)

JVM version: 1.8.0_111

OS version: MacOS

Description of the problem including expected versus actual behavior:

When using the _analyze API, if the request body only specifies a char_filter, it will simply use the default analyzer and not include the char_filter as well.

Steps to reproduce:
This works as expected, executing the char_filter before the tokenizer:

GET _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "foo-bar"
}

response:

{
  "tokens": [
    {
      "token": "foo_bar",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

However, omitting the tokenizer, the char_filter doesn't appear to execute:

GET _analyze
{
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "foo-bar"
}

response:

{
  "tokens": [
    {
      "token": "foo",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "bar",
      "start_offset": 4,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Further evidence: running with an unknown char_filter doesn't generate an error:

GET _analyze
{
  "char_filter": [
      {
        "type" : "mmmmmmapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "foo-bar"
}

response:

{
  "tokens": [
    {
      "token": "foo",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "bar",
      "start_offset": 4,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

whereas, normally it would:

GET _analyze
{
  "tokenizer": "standard", 
  "char_filter": [
      {
        "type" : "mmmmmmapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "foo-bar"
}

response:

{
  "error": {
    "root_cause": [
      {
        "type": "remote_transport_exception",
        "reason": "[s17n1jN][127.0.0.1:9300][indices:admin/analyze[s]]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "failed to find global char filter under [mmmmmmapping]"
  },
  "status": 400
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions