Skip to content

Whitespace tokenizer splits at 255 characters by default #26601

@quilin

Description

@quilin

Elasticsearch Version: 5.4.0, Build: 780f8c4/2017-04-28T17:43:27.229Z, JVM: 1.8.0_131

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

OS Name Microsoft Windows 10 Enterprise
Version 10.0.14393 Build 14393
But the issue seems to reporoduce on some Linux machine (in AWS) as well

Description of the problem including expected versus actual behavior:
The false positive match against the analyzed field. Expected nothing to be found, but found a match instead.

Steps to reproduce:

  1. Create the index
    curl -XPUT localhost:9200/test_index_bug {"settings":{"analysis":{"analyzer":{"caseInsensitiveKeyAnalyzer":{"type":"custom","filter":["lowercase"],"tokenizer":"whitespace"}}}},"mappings":{"ric":{"properties":{"id":{"type":"integer"},"ricName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256},"ricName":{"type":"text","analyzer":"caseInsensitiveKeyAnalyzer"}}},"displayName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256},"displayName":{"type":"text","analyzer":"caseInsensitiveKeyAnalyzer"}}},"exchangeCode":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"exchangeName":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"domain":{"type":"integer"},"peId":{"type":"integer"},"isAlias":{"type":"boolean"},"createdDate":{"type":"date"},"updatedDate":{"type":"date"}}}}}
  2. Put the document
    curl -XPUT localhost:9200/test_index_bug/123 {"id":123,"publicId":"F","ricName":"F","createdDate":"2017-09-12T11:54:51.1421471Z","updatedDate":"2017-09-12T11:54:51.1421471Z","domain":123,"displayName":"test","exchangeNumber":123,"exchangeName":"test","exchangeCode":"test","peId":123,"isAlias":false}
  3. Search for the document
    curl -XPOST localhost:9200/_search {"size":1,"query":{"bool":{"must":[{"bool":{"should":[{"match":{"ricName.ricName":{"query":"dgsdlruityhaekljeha;izudyvklaejrbt09834576hgadagyhrtertkghsldkjrhtweiorugysdokljagvalkjdrthwieuartyad78z967agyhdajklskjrntjre.skdauygusldbnalwkeruty57840w67yrukshgdajklsghkdajghslkduayhtweuiry68593470498576wiuergпавыапргцыкше8рполдваяыловраолпдфлгвангшфщыгывнаолдыоамлсичрядлывороалфдцоукрегкшвщышгапниоладчитоваирыолфdhsgjsksrtsrtkdghkdhgkdhkdgkhaasadjlknknkjsgdnm,.sdagnn,mhjlkwerthjlksdgahjklyoiugadsgnm.,hjlkyoiudbnjgadkbkjgadbhkjadgshkjdaghjksdahgshdagjkshgkjahskjlhgakjshgakjsdgdshaglkjsdhgakjdshgakjdslgf"}}}]}}]}}}

Expect nothing to be found, but instead, a hit with a single document is returned:

{ "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.2876821, "hits": [ { "_index": "test_index_bug", "_type": "ric", "_id": "123", "_score": 0.2876821, "_source": { "id": 123, "publicId": "F", "ricName": "F", "createdDate": "2017-09-12T11:54:51.1421471Z", "updatedDate": "2017-09-12T11:54:51.1421471Z", "domain": 123, "displayName": "test", "exchangeNumber": 123, "exchangeName": "test", "exchangeCode": "test", "peId": 123, "isAlias": false } } ] } }

The length of the query string is 511 symbols. If I change the length or replace the last letter ("f" for anything but "F"), the response is empty as expected.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions