`keyword_repeat` and `multiplexer` don't play well with subsequent synonym filters

I recently saw an issue where an anlyzer chain was set up to perform some stemming on the input and then apply a synonym filter afterwards.
In order to also keep the unstemmed tokens in the output (and apply synonyms as well there if possible), a `keyword_repeat` filter was used, but
this already leads to errors on index creating because the synonyms in the filter are validated by running through the analysis chain:

```
PUT /index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonyms": {
          "type": "synonym",
          "synonyms": [
            "optimised => optimized"
          ]
        },
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "light_english_stemmer": {
          "type": "stemmer",
          "language": "light_english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "blogs_synonyms_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "keyword_repeat",
            "light_english_stemmer",
            "my_synonyms"
          ]
        }
      }
    }
  }
}
```

Gives: 

```
    "type": "illegal_argument_exception",
    "reason": "failed to build synonyms",
    "caused_by": {
      "type": "parse_exception",
      "reason": "Invalid synonym rule at line 1",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "term: optimised analyzed to a token (optimise) with position increment != 1 (got: 0)"
      }
    }
```

I also tried using a `multipexer` like so, but that is running into similar issues:

```
PUT /index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonyms": {
          "type": "synonym",
          "synonyms": [
            "optimised => optimized"
          ]
        },
        "my_multiplexer": {
          "type": "multiplexer",
          "filters": ["light_english_stemmer"]
        },
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "light_english_stemmer": {
          "type": "stemmer",
          "language": "light_english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "blogs_synonyms_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_multiplexer",
            "my_synonyms"
            
          ]
        }
      }
    }
  }
}
```

I'm wondering if I'm using this the wrong way or if there are other ways to achieve similar effect.
Also I'm trying to understand what the position checks that are causing this rejection in `SynonymMap#analyze` are supposed to prevent 
and if those checks could possibly be omitted for the case of the tokens generated by `keyword_repeat` or `multiplexer`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`keyword_repeat` and `multiplexer` don't play well with subsequent synonym filters #33609

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

keyword_repeat and multiplexer don't play well with subsequent synonym filters #33609

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`keyword_repeat` and `multiplexer` don't play well with subsequent synonym filters #33609