Skip to content

Lowercase normalizer is used for wildcard queries #28894

@dadoonet

Description

@dadoonet

Elasticsearch version (bin/elasticsearch --version): 6.2.2
Description of the problem including expected versus actual behavior:

Say you index a field Aa as a text field with a Lowercase analyzer.
When you search for aa*, it matches. Searching for Aa* does not match which is normal as the wildcard queries are not analyzed.

Say you index a field Aa as a keyword field with a Lowercase normalizer.
When you search for aa*, it matches. Searching for Aa* matches as well although the wildcard queries are not analyzed.

Steps to reproduce:

DELETE test
PUT test
{
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "foo": {
          "type": "text",
          "analyzer": "simple", 
          "fields": {
            "keyword": {
              "type": "keyword",
              "normalizer": "lowercase_normalizer"
            }
          }
        }
      }
    }
  }
}
PUT test/doc/1?refresh
{
  "foo": "Bbb Aaa"
}

# Does not match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo": "bb*"
    }
  }
}
# Match but should not -> KO
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "Bb*"
    }
  }
}
# Match -> OK
GET test/_search
{
  "query": {
    "wildcard": {
      "foo.keyword": "bb*"
    }
  }
}

I spoke with @jpountz who thinks it might be related to https://issues.apache.org/jira/browse/LUCENE-8186

Opening the issue so we can track it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search Relevance/SearchCatch all for Search Relevance>bugTeam:Search RelevanceMeta label for the Search Relevance team in Elasticsearchpriority:normalA label for assessing bug priority to be used by ES engineers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions