Support kuromoji user dictionary set directly in the settings file

It would be nice if [kuromoji_tokenizer](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji-tokenizer.html) supports loading user dictionary via array of dictionary entries in the settings json directly, not only from the file.

Current settings example looks like the below:

```json
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "discard_punctuation": "false",
            "user_dictionary": "userdict_ja.txt"
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}
```

My suggestion is to have new json property named `user_dictionary_entires` (or similar) at the same level of current `user_dictionary`, and it accepts the array of dictionary entries. If both `user_dictionary` and `user_dictionary_entries` given, then it has to either merge both inputs or use only one of them though, I think simply prioritize one of those inputs would be simpler. This is actually pretty similar to the way the [Synonym Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html) supports already.

So the new json format would be:

```json
{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "kuromoji_user_dict": {
            "type": "kuromoji_tokenizer",
            "mode": "extended",
            "discard_punctuation": "false",
            "user_dictionary_entires": [
              "東京スカイツリー,東京 スカイツリー,トウキョウ スカイツリー,カスタム名詞",
              "..."
            ]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "type": "custom",
            "tokenizer": "kuromoji_user_dict"
          }
        }
      }
    }
  }
}
```

If this sounds good to you, I can create a pull request anytime. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support kuromoji user dictionary set directly in the settings file #25343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support kuromoji user dictionary set directly in the settings file #25343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions