@@ -222,31 +222,128 @@ GET _cat/shards?v=true
222222
223223[discrete]
224224[[field-count-recommendation]]
225- ==== Data nodes should have at least 1kB of heap per field per index, plus overheads
226-
227- The exact resource usage of each mapped field depends on its type, but a rule
228- of thumb is to allow for approximately 1kB of heap overhead per mapped field
229- per index held by each data node. In a running cluster, you can also consult the
230- <<cluster-nodes-stats,Nodes stats API>>'s `mappings` indices statistic, which
231- reports the number of field mappings and an estimation of their heap overhead.
232-
233- Additionally, you must also allow enough heap for {es}'s
234- baseline usage as well as your workload such as indexing, searches and
235- aggregations. 0.5GB of extra heap will suffice for many reasonable workloads,
236- and you may need even less if your workload is very light while heavy workloads
237- may require more.
238-
239- For example, if a data node holds shards from 1000 indices, each containing
240- 4000 mapped fields, then you should allow approximately 1000 × 4000 × 1kB = 4GB
241- of heap for the fields and another 0.5GB of heap for its workload and other
242- overheads, and therefore this node will need a heap size of at least 4.5GB.
243-
244- Note that this rule defines the absolute maximum number of indices that a data
245- node can manage, but does not guarantee the performance of searches or indexing
246- involving this many indices. You must also ensure that your data nodes have
247- adequate resources for your workload and that your overall sharding strategy
248- meets all your performance requirements. See also <<single-thread-per-shard>>
249- and <<each-shard-has-overhead>>.
225+ ==== Allow enough heap for field mappers and overheads
226+
227+ Mapped fields consume some heap memory on each node, and require extra
228+ heap on data nodes.
229+ Ensure each node has enough heap for mappings, and also allow
230+ extra space for overheads associated with its workload. The following sections
231+ show how to determine these heap requirements.
232+
233+ [discrete]
234+ ===== Mapping metadata in the cluster state
235+
236+ Each node in the cluster has a copy of the <<cluster-state-api-desc,cluster state>>.
237+ The cluster state includes information about the field mappings for
238+ each index. This information has heap overhead. You can use the
239+ <<cluster-stats,Cluster stats API>> to get the heap overhead of the total size of
240+ all mappings after deduplication and compression.
241+
242+ [source,console]
243+ ----
244+ GET _cluster/stats?human&filter_path=indices.mappings.total_deduplicated_mapping_size*
245+ ----
246+ // TEST[setup:node]
247+
248+ This will show you information like in this example output:
249+
250+ [source,console-result]
251+ ----
252+ {
253+ "indices": {
254+ "mappings": {
255+ "total_deduplicated_mapping_size": "1gb",
256+ "total_deduplicated_mapping_size_in_bytes": 1073741824
257+ }
258+ }
259+ }
260+ ----
261+ // TESTRESPONSE[s/"total_deduplicated_mapping_size": "1gb"/"total_deduplicated_mapping_size": $body.$_path/]
262+ // TESTRESPONSE[s/"total_deduplicated_mapping_size_in_bytes": 1073741824/"total_deduplicated_mapping_size_in_bytes": $body.$_path/]
263+
264+ [discrete]
265+ ===== Retrieving heap size and field mapper overheads
266+
267+ You can use the <<cluster-nodes-stats,Nodes stats API>> to get two relevant metrics
268+ for each node:
269+
270+ * The size of the heap on each node.
271+
272+ * Any additional estimated heap overhead for the fields per node. This is specific to
273+ data nodes, where apart from the cluster state field information mentioned above,
274+ there is additional heap overhead for each mapped field of an index held by the data
275+ node. For nodes which are not data nodes, this field may be zero.
276+
277+ [source,console]
278+ ----
279+ GET _nodes/stats?human&filter_path=nodes.*.name,nodes.*.indices.mappings.total_estimated_overhead*,nodes.*.jvm.mem.heap_max*
280+ ----
281+ // TEST[setup:node]
282+
283+ For each node, this will show you information like in this example output:
284+
285+ [source,console-result]
286+ ----
287+ {
288+ "nodes": {
289+ "USpTGYaBSIKbgSUJR2Z9lg": {
290+ "name": "node-0",
291+ "indices": {
292+ "mappings": {
293+ "total_estimated_overhead": "1gb",
294+ "total_estimated_overhead_in_bytes": 1073741824
295+ }
296+ },
297+ "jvm": {
298+ "mem": {
299+ "heap_max": "4gb",
300+ "heap_max_in_bytes": 4294967296
301+ }
302+ }
303+ }
304+ }
305+ }
306+ ----
307+ // TESTRESPONSE[s/"USpTGYaBSIKbgSUJR2Z9lg"/\$node_name/]
308+ // TESTRESPONSE[s/"name": "node-0"/"name": $body.$_path/]
309+ // TESTRESPONSE[s/"total_estimated_overhead": "1gb"/"total_estimated_overhead": $body.$_path/]
310+ // TESTRESPONSE[s/"total_estimated_overhead_in_bytes": 1073741824/"total_estimated_overhead_in_bytes": $body.$_path/]
311+ // TESTRESPONSE[s/"heap_max": "4gb"/"heap_max": $body.$_path/]
312+ // TESTRESPONSE[s/"heap_max_in_bytes": 4294967296/"heap_max_in_bytes": $body.$_path/]
313+
314+ [discrete]
315+ ===== Consider additional heap overheads
316+
317+ Apart from the two field overhead metrics above, you must additionally allow
318+ enough heap for {es}'s baseline usage as well as your workload such as indexing,
319+ searches and aggregations. 0.5GB of extra heap will suffice for many reasonable
320+ workloads, and you may need even less if your workload is very light while heavy
321+ workloads may require more.
322+
323+ [discrete]
324+ ===== Example
325+
326+ As an example, consider the outputs above for a data node. The heap of the node
327+ will need at least:
328+
329+ * 1 GB for the cluster state field information.
330+
331+ * 1 GB for the additional estimated heap overhead for the fields of the data node.
332+
333+ * 0.5 GB of extra heap for other overheads.
334+
335+ Since the node has a 4GB heap max size in the example, it is thus sufficient
336+ for the total required heap of 2.5GB.
337+
338+ If the heap max size for a node is not sufficient, consider
339+ <<avoid-unnecessary-fields,avoiding unnecessary fields>>,
340+ or scaling up the cluster, or redistributing index shards.
341+
342+ Note that the above rules do not necessarily guarantee the performance of
343+ searches or indexing involving a very high number of indices. You must also
344+ ensure that your data nodes have adequate resources for your workload and
345+ that your overall sharding strategy meets all your performance requirements.
346+ See also <<single-thread-per-shard>> and <<each-shard-has-overhead>>.
250347
251348[discrete]
252349[[avoid-node-hotspots]]
0 commit comments