Skip to content

token_count type : add an option to count tokens (and not positions) #23227

@fbaligand

Description

@fbaligand

Elasticsearch version: 2.4.2 and 5.2.1

Description of the problem including expected versus actual behavior:
Currently, if I have a "token_count" field, based on an analyzer containing a stop filter, the indexed "token_count" field counts stop words.
It would be great to have an option to only count analysis result tokens.

Steps to reproduce:

  1. I have this index configuration :
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "default": {
            "tokenizer": "standard",
            "filter": [
              "standard",
              "lowercase",
              "stop_words"
            ]
          }
        },
        "filter": {
          "stop_words": {
            "type": "stop",
            "stopwords": [
              "this", "is", "a"
            ]
          }
        }
      }
    }
  },
  "mappings": {
      "properties": {
        "mytext": {
        	"index": "analyzed",
         	"type": "string",
         	"analyzer": "default",
         	"fields": {
         		"length": {
         			"type": "token_count",
         			"analyzer": "default",
         			"store": "yes"
         		}
         	}
        }
      }
    }
  }
}
  1. I index this document :
{
 "mytext": "this is a cat"
}
  1. I make this search query :
GET _search
{
  "query": {
    "query_string": {
      "query": "mytext.length:1"
    }
  }
}
  1. It should return 1 result. But it actually returns 0 result.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions