elastic · jpountz · Jun 16, 2017 · Jun 15, 2017 · Jun 16, 2017
diff --git a/docs/reference/how-to/disk-usage.asciidoc b/docs/reference/how-to/disk-usage.asciidoc
@@ -158,3 +158,24 @@ on disk usage. In particular, integers should be stored using an integer type
 stored in a `scaled_float` if appropriate or in the smallest type that fits the
 use-case: using `float` over `double`, or `half_float` over `float` will help
 save storage.
+
+[float]
+=== Use index sorting to colocate similar documents
+
+When Elasticsearch stores `_source`, it compresses multiple documents at once
+in order to improve the overall compression ratio. For instance it is very
+common that documents share the same field names, and quite common that they
+share some field values, especially on fields that have a low cardinality or
+a https://en.wikipedia.org/wiki/Zipf%27s_law[zipfian] distribution.
+
+By default documents are compressed together in the order that they are added
+to the index. If you enabled <<index-modules-index-sorting,index sorting>>
+then instead they are compressed in sorted order. Sorting documents with similar
+structure, fields, and values together should improve the compression ratio.
+
+[float]
+=== Put fields in the same order in documents
+
+Due to the fact that multiple documents are compressed together into blocks,
+it is more likely to find longer duplicate strings in those `_source` documents
+if fields always occur in the same order.
diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc
@@ -326,3 +326,45 @@ queries, they should be mapped as a `keyword`.
 <<index-modules-index-sorting,Index sorting>> can be useful in order to make
 conjunctions faster at the cost of slightly slower indexing. Read more about it
 in the <<index-modules-index-sorting-conjunctions,index sorting documentation>>.
+
+[float]
+=== Use `preference` to optimize cache utilization
+
+There are multiple caches that can help with search performance, such as the
+https://en.wikipedia.org/wiki/Page_cache[filesystem cache], the
+<<shard-request-cache,request cache>> or the <<query-cache,query cache>>. Yet
+all these caches are maintained at the node level, meaning that if you run the
+same request twice in a row, have 1 <<glossary-replica-shard,replica>> or more
+and use https://en.wikipedia.org/wiki/Round-robin_DNS[round-robin], the default
+routing algorithm, then those two requests will go to different shard copies,
+preventing node-level caches from helping.
+
+Since it is common for users of a search application to run similar requests
+one after another, for instance in order to analyze a narrower subset of the
+index, using a preference value that identifies the current user or session
+could help optimize usage of the caches.
+
+[float]
+=== Replicas might help with throughput, but not always
+
+In addition to improving resiliency, replicas can help improve throughput. For
+instance if you have a single-shard index and three nodes, you will need to
+set the number of replicas to 2 in order to have 3 copies of your shard in
+total so that all nodes are utilized.
+
+Now imagine that you have a 2-shards index and two nodes. In one case, the
+number of replicas is 0, meaning that each node holds a single shard. In the
+second case the number of replicas is 1, meaning that each node has two shards.
+Which setup is going to perform best in terms of search performance? Usually,
+the setup that has fewer shards per node in total will perform better. The
+reason for that is that it gives a greater share of the available filesystem
+cache to each shard, and the filesystem cache is probably Elasticsearch's
+number 1 performance factor. At the same time, beware that a setup that does
+not have replicas is subject to failure in case of a single node failure, so
+there is a trade-off between throughput and availability.
+
+So what is the right number of replicas? If you have a cluster that has
+`num_nodes` nodes, `num_primaries` primary shards _in total_ and if you want to
+be able to cope with `max_failures` node failures at once at most, then the
+right number of replicas for you is
+`max(max_failures, ceil(num_nodes / num_primaries) - 1)`.