Expose preserveOriginal in NGramTokenFilterFactory which is marked as TODO in master code.

`preserveOriginal` setting is currently not supported in the `NGramTokenFilter` https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html and there is even TODO comment in the master code of Elasticsearch(as of 19th Apr, 2020) to Expose preserveOriginal as shown in this GitHub code link https://github.com/elastic/elasticsearch/blob/master/modules/analysis-common/src/main/java/org/elasticsearch/analysis/common/NGramTokenFilterFactory.java#L53 


**Elasticsearch version** (`bin/elasticsearch --version`):
8.0.0-SNAPSHOT

**Plugins installed**: []
N/A

**JVM version** (`java -version`):
openjdk 14.0.1 2020-04-14
OpenJDK Runtime Environment (build 14.0.1+7)
OpenJDK 64-Bit Server VM (build 14.0.1+7, mixed mode, sharing)

**OS version** (`uname -a` if on a Unix-like system):
Darwin LT6577 19.3.0 Darwin Kernel Version 19.3.0: Thu Jan  9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64 x86_64

**Description of the problem including expected versus actual behavior**:
Its a feature request and mentioned in the TODO of Elasticsearch master code, if provided preserve original functionality would work with n-gram token filter.
**Steps to reproduce**:

Please include a *minimal* but *complete* recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc.  The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

 1. Delete the existing index with the name `preserveoriginal` to test this feature.
`curl --user elastic:password -XDELETE localhost:9200/preserveoriginal`
 2. Create a new index with  custom analyzer which uses`ngram` token filter.
```
curl --user elastic:password -X PUT "localhost:9200/preserveoriginal?pretty" -H 'Content-Type: application/json' -d'
{
    "settings": {
        "max_ngram_diff": 50,
        "analysis": {
            "filter": {
                "ngram_filter": {
                    "type": "ngram",
                    "min_gram": 1,
                    "max_gram": 2
                }
            },
            "analyzer": {
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "whitespace",
                    "filter": [
                        "lowercase",
                        "ngram_filter"
                    ]
                }
            }
        }
    }
}
'
```
 3. Check the tokens generated by `ngram_analyzer` created in the above step:
```
curl --user elastic:password -X GET "localhost:9200/preserveoriginal/_analyze?pretty" -H 'Content-Type: application/json' -d'
{
  "analyzer" : "ngram_analyzer",
  "text" : "foo"
}
'
```
4. Output of above analyzer API.
```
{
  "tokens" : [
    {
      "token" : "f",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "fo",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "o",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "oo",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "o",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    }
  ]
}
```

**Please see `foo` original token isn't present in the result.**

**Provide logs (if relevant)**:
N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose preserveOriginal in NGramTokenFilterFactory which is marked as TODO in master code. #55431

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expose preserveOriginal in NGramTokenFilterFactory which is marked as TODO in master code. #55431

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions