Skip to content

Conversation

@nik9000
Copy link
Member

@nik9000 nik9000 commented May 9, 2018

Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in #29535.

Closes #29499

nik9000 added 3 commits May 9, 2018 14:58
Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in elastic#29535.

Closes elastic#29499
@nik9000 nik9000 added >docs General docs changes review :Search Relevance/Analysis How text is split into tokens v7.0.0 v6.4.0 labels May 9, 2018
@nik9000 nik9000 requested a review from mayya-sharipova May 9, 2018 22:19
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some small minor changes

recreate it as a `custom` analyzer and modify it, usually by adding
token filters. Usually, you should prefer the
<<keyword, Keyword type>> when you want strings that are not split
into tokens, but just in case you need it, this his would recreate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this his" -> "this" ?

"tokenizer": {
"split_on_non_word": {
"type": "pattern",
"stopwords": "\\W+" <1>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be pattern instead of stopwords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Now I have to figure out how the tests passed when it was wrong....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is because \\W+ is the default. And because we don't complain if you pass extra stuff here.

[float]
=== Definition

The `simple` anlzyer consists of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anlzyer" -> "analyzer"


If you need to customize the `pattern` analyzer beyond the configuration
parameters then you need to recreate it as a `custom` analyzer and modify
it, usually by adding token filters. This would recreate the built in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"built in" -> "built-in"
in this place and in all other places

@nik9000 nik9000 merged commit 9881bfa into elastic:master May 14, 2018
nik9000 added a commit that referenced this pull request May 14, 2018
Adds documentation for how to rebuild all the built in analyzers and
tests for that documentation using the mechanism added in #29535.

Closes #29499
@nik9000
Copy link
Member Author

nik9000 commented May 14, 2018

Thanks @mayya-sharipova! I've fixed it up as you requested and merged and backported.

dnhatn added a commit that referenced this pull request May 15, 2018
* 6.x:
  Revert "Silence IndexUpgradeIT test failures. (#30430)"
  [DOCS] Remove references to changelog and to highlights
  Revert "Mute ML upgrade test (#30458)"
  [ML] Fix BWC version for backport of #30125
  [Docs] Improve section detailing translog usage (#30573)
  [Tests] Relax allowed delta in extended_stats aggregation (#30569)
  Fail if reading from closed KeyStoreWrapper (#30394)
  [ML] Reverse engineer Grok patterns from categorization results (#30125)
  Derive max composite buffers from max content len
  Update build file due to doc file rename
  SQL: Extract SQL request and response classes (#30457)
  Remove the changelog (#30593)
  Revert "Add deprecation warning for default shards (#30587)"
  Silence IndexUpgradeIT test failures. (#30430)
  Add deprecation warning for default shards (#30587)
  [DOCS] Adds 6.4.0 release highlight pages
  [DOCS] Adds release highlight pages (#30590)
  Docs: Document how to rebuild analyzers (#30498)
  [DOCS] Fixes title capitalization in security content
  LLRest: Add equals and hashcode tests for Request (#30584)
  [DOCS] Fix realm setting names (#30499)
  [DOCS] Fix path info for various security files (#30502)
  Docs: document precision limitations of geo_bounding_box (#30540)
  Fix non existing javadocs link in RestClientTests
  Auto-expand replicas only after failing nodes (#30553)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes :Search Relevance/Analysis How text is split into tokens v6.4.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants