Skip to content

Fuzzy match very slow on indices with shingle filter #23594

@jeantil

Description

@jeantil

Elasticsearch version:
5.2.2 (I reproduced this issue on the 5.2.2 docker image provided by elastic.co)

Plugins installed: [x-pack:5.2.2]
(is installed by default in the docker image)
JVM version:

                "version": "1.8.0_92-internal",
                "vm_name": "OpenJDK 64-Bit Server VM",
                "vm_vendor": "Oracle Corporation",
                "vm_version": "25.92-b14"

OS version:
the os of the docker image is a linux, it runs on docker4mac 17.03.0-ce-mac2 (15657) on mac osX sierra

Description of the problem including expected versus actual behavior:
Running the following request on a very small index (see reproduction steps) is consistently very slow:

GET season/season/_search
{
  "query" : {
    "bool" : {
      "must" : [
        {
          "match" : {
            "name.classic" : {
              "query" : "RENAULT Talisman Talisman Estate 1.6 dCi 130 225/55 R17 101W winter",
              "operator" : "OR",
              "fuzziness" : "AUTO",
              "prefix_length" : 0,
              "max_expansions" : 50,
              "fuzzy_transpositions" : true,
              "lenient" : false,
              "zero_terms_query" : "NONE",
              "boost" : 1.0
            }
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  }
}

the previous query returns in over 3 seconds :

 "took": 4360,

when disabling the fuzzyness the query runs orders of magnitude faster

  "took": 87,

disabling shingling in the query analysis also runs orders of magnitude faster

  "took": 11,

Steps to reproduce:

DELETE season
PUT season
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "search_shingler": {
            "max_shingle_size": "3",
            "min_shingle_size": "2",
            "token_separator": " ",
            "output_unigrams": "true",
            "filler_token": "_",
            "output_unigrams_if_no_shingles": "false",
            "type": "shingle"
          }
        },
        "analyzer": {
          "seasons-searchAnalyzer": {
            "filter": [
              "asciifolding",
              "lowercase",
              "search_shingler"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          },
          "seasons-indexAnalyzer": {
            "filter": [
              "asciifolding",
              "lowercase"
            ],
            "type": "custom",
            "tokenizer": "keyword"
          }
        }
      }
    }
  }
}

PUT season/_mapping/season
{
 "properties": {
   "name": {
     "type": "text",
     "fields": {
       "classic": {
         "type": "text",
         "analyzer": "seasons-indexAnalyzer",
         "search_analyzer": "seasons-searchAnalyzer"
       },
       "raw": {
         "type": "keyword"
       }
     }
   },
   "value": {
     "type": "keyword"
   }
 }   
}
POST season/season
{ "name": "winter", "value": "Winter"}
POST season/season
{ "name": "winteur", "value": "Winter"}
POST season/season  
{ "name": "summer", "value": "Summer"}
POST season/season  
{ "name": "all seasons", "value": "AllSeasons"}
POST season/season  
{ "name": "nordic", "value": "Nordic" }

Provide logs (if relevant):
I couldn't find anything relevant in the logs (I only get deleting index, creating index, adding mapping, etc... )

comparison with previous versions

I tried running the exact same scenario on our previous configuration (es 1.7) the query with both fuzzy and shingling ran in about 150ms. we never tried intermediate versions as our upgrade path was blocked by the removal of FLT until #9103 got merged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Search/SearchSearch-related issues that do not fall into other categoriesdiscuss

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions