Skip to content

Normalizers poor support for token filters and missing docs #28605

@Lackoftactics

Description

@Lackoftactics

When trying to skip using fielddate, I really want fast queries without keeping things in memory and for finding fields with type: 'keyword' you have to use `normalizer'. I would use this query for sorting huge amount of data, if that's helpful.

Usage of normalizer is really limited and you can only examine tokens which you get from normalizer from 6.x version.

curl -XGET 'localhost:9200/events/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "normalizer" : "sortable",
  "text" : "Triathlon race "
}
'

When adding new normalizer in settings

"normalizer":{  
            "sortable":{  
               "type":"custom",
               "char_filter":[  

               ],
               "filter":[  
                  "lowercase",
                  "trim"
               ]
            }
         }

I stumbled that you don't support trim method. So I tried to hack my way through with building custom analyzer that will do the same.

        "analysis":{  
         "filter":{  
            "custom_trim":{  
               "type":"pattern_capture",
               "preserve_original":false,
               "patterns":[  
                  "^ *([Ww]*)\b *$"
               ]
            }
         },
         "normalizer":{  
            "sortable":{  
               "type":"custom",
               "char_filter":[  

               ],
               "filter":[  
                  "lowercase",
                  "custom_trim"
               ]
            }
         }
      }

To be met with:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Custom normalizer [sortable] may not use filter [custom_trim]"}],"type":"illegal_argument_exception","reason":"Custom normalizer [sortable] may not use filter [custom_trim]"},"status":400}

I know that currently don't support all the methods, probably dependent on Lucene, but it would be useful for us as developers to have at least some documentation with currently with what works as playing guessing game is not good for us and not good for you.

I see too many posts with people frustrated with that issue and there are many places where documentation is just great or at least better more explanatory errors.

Metadata

Metadata

Labels

:Search Relevance/AnalysisHow text is split into tokens>docsGeneral docs changesTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearch

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions