-
Notifications
You must be signed in to change notification settings - Fork 25.6k
More advices around search speed and disk usage. #25252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It adds notes about: - how preference can help optimize cache usage - the fact that too many replicas can hurt search performance due to lower utilization of the filesystem cache - how index sorting can improve _source compression - how always putting fields in the same order in documents can improve _source compression
nik9000
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few minor things but LGTM.
| [float] | ||
| === Use index sorting to colocate similar documents | ||
|
|
||
| Elasticsearch compresses multiple documents at once in order to improve the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like "When Elasticsearch stores _source, it compresses multiple documents at once to improve the overall compression ratio...." so we don't get people thinking that doc values and the inverted index bits are stored like this.
| Elasticsearch compresses multiple documents at once in order to improve the | ||
| overall compression ratio. For instance it is very common that documents share | ||
| the same field names, and quite common that they share some field values, | ||
| especially on fields that have a low cardinality or a zipfian distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zipfian might deserve a link to wikipedia.
| the same field names, and quite common that they share some field values, | ||
| especially on fields that have a low cardinality or a zipfian distribution. | ||
|
|
||
| Documents that are compressed together are documents that are colocated in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd twist this around to something like "By default documents are compressed together in the order that they are added to the index. If you enabled index sorting then instead they are compressed in sorted order. Sorting documents with similar structure, fields, and values together should improve the compression ratio." Or something like that. It feels more active that way. I dunno.
| === Use `preference` to optimize cache utilization | ||
|
|
||
| There are multiple caches that can help with search performance, such as the | ||
| filesystem cache, the <<shard-request-cache,request cache>> or the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make filesystem cache a link to https://en.wikipedia.org/wiki/Page_cache ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/or/and/
| filesystem cache, the <<shard-request-cache,request cache>> or the | ||
| <<query-cache,query cache>>. Yet all these caches are maintained at the node | ||
| level, meaning that if you run the same request twice in a row, have 1 | ||
| <<glossary-replica-shard,replica>> or more and use the default routing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd s/use the default routing algorithm, which is round-robin,/use round-robin, the default routing algorithm/
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…y-context * 'master' of github.com:elastic/elasticsearch: (21 commits) [DOCS] Clarify expected availability of HDFS for the HDFS Repository (elastic#25220) Remove some redundant 140 character checkstyle suppressions [Docs] more fix for the parent-join docs [Docs] Fix cross reference for parent-join field More advices around search speed and disk usage. (elastic#25252) Add documentation for the new parent-join field (elastic#25227) [analysis-icu] Allow setting unicodeSetFilter (elastic#20814) Introduce translog size and age based retention policies (elastic#25147) Add needs methods for specific variables to Painless script context factories. (elastic#25267) Improves snapshot logging and snapshoth deletion error handling (elastic#25264) Add unit test for PathHierarchyTokenizerFactory (elastic#24984) Deprecate tribe service Moved more token filters to analysis-common module. [Test] Make sure that SearchAfterSortedDocQueryTests uses a single threaded searcher [DOCS] Defined es-test-dir and plugins-examples-dir in index.asciidoc. (elastic#25232) Test fix - removed superfluous assertion (elastic#25247) [Test] restore BWC for parent-join now that the new mapping format is in 5.x Add a section named "relations" in the ParentJoinFieldMapper (elastic#25248) test: Ported more OldIndexBackwardsCompatibilityIT tests to full cluster restart qa tests. (elastic#25173) fix: Sort Processor does not have proper behavior with targetField (elastic#25237) ...
It adds notes about:
utilization of the filesystem cache
compression