Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 15, 2017

It adds notes about:

  • how preference can help optimize cache usage
  • the fact that too many replicas can hurt search performance due to lower
    utilization of the filesystem cache
  • how index sorting can improve _source compression
  • how always putting fields in the same order in documents can improve _source
    compression

It adds notes about:
 - how preference can help optimize cache usage
 - the fact that too many replicas can hurt search performance due to lower
   utilization of the filesystem cache
 - how index sorting can improve _source compression
 - how always putting fields in the same order in documents can improve _source
   compression
@jpountz jpountz added the >docs General docs changes label Jun 15, 2017
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor things but LGTM.

[float]
=== Use index sorting to colocate similar documents

Elasticsearch compresses multiple documents at once in order to improve the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like "When Elasticsearch stores _source, it compresses multiple documents at once to improve the overall compression ratio...." so we don't get people thinking that doc values and the inverted index bits are stored like this.

Elasticsearch compresses multiple documents at once in order to improve the
overall compression ratio. For instance it is very common that documents share
the same field names, and quite common that they share some field values,
especially on fields that have a low cardinality or a zipfian distribution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zipfian might deserve a link to wikipedia.

the same field names, and quite common that they share some field values,
especially on fields that have a low cardinality or a zipfian distribution.

Documents that are compressed together are documents that are colocated in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd twist this around to something like "By default documents are compressed together in the order that they are added to the index. If you enabled index sorting then instead they are compressed in sorted order. Sorting documents with similar structure, fields, and values together should improve the compression ratio." Or something like that. It feels more active that way. I dunno.

=== Use `preference` to optimize cache utilization

There are multiple caches that can help with search performance, such as the
filesystem cache, the <<shard-request-cache,request cache>> or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make filesystem cache a link to https://en.wikipedia.org/wiki/Page_cache ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/or/and/

filesystem cache, the <<shard-request-cache,request cache>> or the
<<query-cache,query cache>>. Yet all these caches are maintained at the node
level, meaning that if you run the same request twice in a row, have 1
<<glossary-replica-shard,replica>> or more and use the default routing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd s/use the default routing algorithm, which is round-robin,/use round-robin, the default routing algorithm/

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz jpountz merged commit 8c869e2 into elastic:master Jun 16, 2017
@jpountz jpountz deleted the docs/preference_search_speed branch June 16, 2017 09:23
@jpountz jpountz added the v6.0.0 label Jun 16, 2017
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 16, 2017
…y-context

* 'master' of github.com:elastic/elasticsearch: (21 commits)
  [DOCS] Clarify expected availability of HDFS for the HDFS Repository (elastic#25220)
  Remove some redundant 140 character checkstyle suppressions
  [Docs] more fix for the parent-join docs
  [Docs] Fix cross reference for parent-join field
  More advices around search speed and disk usage. (elastic#25252)
  Add documentation for the new parent-join field (elastic#25227)
  [analysis-icu] Allow setting unicodeSetFilter (elastic#20814)
  Introduce translog size and age based retention policies (elastic#25147)
  Add needs methods for specific variables to Painless script context factories. (elastic#25267)
  Improves snapshot logging and snapshoth deletion error handling (elastic#25264)
  Add unit test for PathHierarchyTokenizerFactory (elastic#24984)
  Deprecate tribe service
  Moved more token filters to analysis-common module.
  [Test] Make sure that SearchAfterSortedDocQueryTests uses a single threaded searcher
  [DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc.  (elastic#25232)
  Test fix - removed superfluous assertion (elastic#25247)
  [Test] restore BWC for parent-join now that the new mapping format is in 5.x
  Add a section named "relations" in the ParentJoinFieldMapper (elastic#25248)
  test: Ported more OldIndexBackwardsCompatibilityIT tests to full cluster restart qa tests. (elastic#25173)
  fix: Sort Processor does not have proper behavior with targetField (elastic#25237)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>docs General docs changes v6.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants