From da97ecc6524878d4d50a68189babdeacee05acfb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christoph=20B=C3=BCscher?= Date: Thu, 5 Jul 2018 15:26:09 +0200 Subject: [PATCH] [Docs] Add clarification to analysis example There have been at least two PRs trying to fix the spelling of "lazi" because it isn't very clear from the example that the english analyzer will stem each token in the example. This adds a short description of the analysis process to make this clearer. Relates to #31797 --- docs/reference/analysis.asciidoc | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/reference/analysis.asciidoc b/docs/reference/analysis.asciidoc index 85434720c3e8b..c5fcce3ad5fa9 100644 --- a/docs/reference/analysis.asciidoc +++ b/docs/reference/analysis.asciidoc @@ -13,15 +13,18 @@ defined per index. [float] == Index time analysis -For instance at index time, the built-in <> _analyzer_ would -convert this sentence: +For instance, at index time the built-in <> _analyzer_ +will first convert the sentence: [source,text] ------ "The QUICK brown foxes jumped over the lazy dog!" ------ -into these terms, which would be added to the inverted index. +into distinct tokens. It will then lowercase each token, remove frequent +stopwords ("the") and reduce the terms to their word stems (foxes -> fox, +jumped -> jump, lazy -> lazi). In the end, the following terms will be added +to the inverted index: [source,text] ------