Skip to content

Stopwords are not being removed from wildcard query_string queries.  #1272

@mwiercinski

Description

@mwiercinski

Hi,

If you run query_string request including of the stopword followed by "*" wildcard, Elasticsearch will act like it has not filtered the stopword off. This is regardless of analyze_wildcard and whether the searched field is being analyzed or not (set through mappings API) .

An example:

Clean the index:

$ curl -XDELETE localhost:9200/test_index?pretty
{
  "ok" : true,
  "acknowledged" : true
}

Populate with The Times.

$ curl -XPUT localhost:9200/test_index/test_type/1?pretty -d ' {"name": "The Times" } ' 
{
  "ok" : true,
  "_index" : "test_index",
  "_type" : "test_type",
  "_id" : "1",
  "_version" : 1
}

Query for the times does work correctly:

$ curl -XGET localhost:9200/test_index/test_type/_search?pretty -d '
{
  "query": {
    "query_string": {
      "query": "the times"
    }
  }
} 
'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2169777,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_score" : 0.2169777, "_source" :  {"name": "The Times" } 
    } ]
  }
}

Same using default_operator set to AND.

$ curl -XGET localhost:9200/test_index/test_type/_search?pretty -d '
{
  "query": {
    "query_string": {
      "query": "the times",
      "default_operator": "AND"
    }
  }
} 
'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2169777,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_score" : 0.2169777, "_source" :  {"name": "The Times" } 
    } ]
  }
}

If I now add the wildcard to the, I will not get any results back:

$ curl -XGET localhost:9200/test_index/test_type/_search?pretty -d '
{
  "query": {
    "query_string": {
      "query": "the* times",
      "default_operator": "AND"
    }
  }
} 
'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

, however does work with wildcard on times:

$ curl -XGET localhost:9200/test_index/test_type/_search?pretty -d '
{
  "query": {
    "query_string": {
      "query": "the times*",
      "default_operator": "AND"
    }
  }
} 
'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test_index",
      "_type" : "test_type",
      "_id" : "1",
      "_score" : 1.0, "_source" :  {"name": "The Times" } 
    } ]
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions