Skip to content

Conversation

@cbuescher
Copy link
Member

The whitespace tokenizer splits tokens longer than 255 characters into multiple tokens,
which can lead to confusing search matches like the one observed in #26601. This adds
a note to the documentation to make this clearer.

Closes #26641

@cbuescher cbuescher changed the title [Docs] Add not about maximum token length for whitespace tokenizer [Docs] Add note about maximum token length for whitespace tokenizer Sep 20, 2017
@colings86 colings86 added v5.6.3 and removed v5.6.2 labels Sep 21, 2017
@cbuescher cbuescher changed the base branch from master to 6.0 September 25, 2017 21:45
@cbuescher cbuescher force-pushed the docs-addNote-WhitespaceTokenizer branch from 4a2d047 to 5d1627c Compare September 25, 2017 21:49
@cbuescher
Copy link
Member Author

This clarifies the docs in 5.6.x and 6.0 while starting with 6.1 we will have support for an overwrite of the "max_token_length" parameter via #26643.

@javanna javanna added v5.6.4 and removed v5.6.3 labels Oct 6, 2017
@cbuescher cbuescher merged commit f098553 into elastic:6.0 Oct 7, 2017
@javanna javanna added v5.6.3 and removed v5.6.4 labels Oct 9, 2017
@lcawl lcawl added v6.0.0-rc2 and removed v6.0.0 labels Oct 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes :Search Relevance/Analysis How text is split into tokens v5.6.3 v6.0.0-rc2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants