Remove shards per gb of heap guidance #86223

original-brownbear · 2022-04-27T13:24:42Z

This guidance does not apply any longer.
The overhead per shard has been significantly reduced in recent versions
and this kind of rule of thumb will be too pessimistic in many if not
most cases and might be too optimistic in other specific ones.
-> remove it

relates #77466

elasticmachine · 2022-04-27T13:24:46Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine · 2022-04-27T13:24:46Z

Pinging @elastic/es-docs (Team:Docs)

This guidance does not apply any longer. The overhead per shard has been significantly reduced in recent versions and removed rule of thumb will be too pessimistic in many if not most cases and might be too optimistic in other specific ones. => Replace guidance with rule of thumb per field count on data nodes and rule of thumb by index count (which is far more relevant nowadays than shards) for master nodes. relates elastic#77466

DaveCTurner · 2022-05-09T09:45:39Z

docs/reference/how-to/size-your-shards.asciidoc

 Every mapped field also carries some overhead in terms of memory usage and disk
-space. By default {es} will automatically create a mapping for every field in
+space. While the exact amount resource usage of each additional field depends
+on the type of field, an additional 512 bytes in memory use  for each additional


512B seems tight, is that a worst-case guarantee? Even in 8.2? I think we can afford to be a bit conservative with the recommendation here.

It's quite conservative for 8.3 I'd say. it's ~4x what a heap dump for Beats indices would suggest in 8.3. 8.2 is probably about twice as expensive (this is only sorta true and depends on the field type).
=> how about 1kb per field then to have a nice round number?

DaveCTurner · 2022-05-09T09:48:19Z

docs/reference/how-to/size-your-shards.asciidoc

+As a result, the more fields the mappings in your indices contain, the fewer
+indices and shards will each individual data node be able to hold,
+all other things being equal. In addition to the per field overhead you should
+at least have an additional 512Mb of heap available per data node.


nit: bytes not bits

Suggested change

at least have an additional 512Mb of heap available per data node.

at least have an additional 512MB of heap available per data node.

Also is that 512MiB enough for everything else the node might need to do, or is that just enough for it to survive? It sounds pretty tight, maybe we need some more caveats here.

On the flip side of that is that running with a few indices with few fields on a 512MB heap with small workload is perfectly possible too. So either the limit need to be lower or a less tight wording than "at least".

The idea was that the 512B above was chosen conservative enough that any additional overheads are covered. In the end 512M is plenty for just data node operations IMO if we have 0.5k/field and especially so if we go with 1k/field.
Maybe add caveats for when the data node is used for ingest and/or coordination? (it's hard to put numbers on either but we could just warn about them in general?)

The circuit breakers that apply to data node operations don't really take the shards-and-fields overhead into account tho, e.g. the fielddata and request breakers just act as if it's ok to use 40% and 60% of the total heap respectively. In the example here, 80% of the heap is already consumed with shards-and-fields overhead, so really (with the real-memory CB) there's only 15% left for doing any useful work.

80% of the heap is already consumed with shards-and-fields overhead

Not really. Now that I made it 1024B in particular, we vastly overestimate the direct per-field overhead. So for larger nodes, we will have about a 50%+ margin here now. If we recommend 4G heap for the fields in the example, then really the steady state memory usage from them is probably 20% of the heap tops leaving enough room for the other operations.
The 512M was mostly meant to add a little more margin for error and to get reasonable numbers for smaller nodes.

docs/reference/how-to/size-your-shards.asciidoc

DaveCTurner · 2022-05-09T09:49:14Z

docs/reference/how-to/size-your-shards.asciidoc

+depends on various factors. The include but are not limited to the size of its mapping,
+the number of shards per index or whether its mapping is shared with other indices.
+A good rule of thumb is to aim for 3000 indices or fewer per GB of heap on master nodes.
+For example, if your cluster contains 12,000 indices and each master node should have


Suggested change

For example, if your cluster contains 12,000 indices and each master node should have

For example, if your cluster contains 12,000 indices then each dedicated master node should have

for non-dedicated masters should we say something about allowing for this much overhead too?

My assumption was that once these numbers start to matter you'd be using dedicated master nodes anyway?
The overhead is definitely quite a bit less if you're on a non-dedicated master node because you share network buffers and such with other operations I'd say. Maybe better to add a line recommending dedicated masters here above say 5k indices or so instead?

Not sure, that seems a pretty low threshold to be going for dedicated masters and quite possibly the wrong variable to choose too. A 3x31GiB cluster with enough CPU to spare should be able to deal with more than 5k indices, and probably better to stick with 3 big nodes rather than add 3 tiny masters on top?

A 3x31GiB cluster with enough CPU to spare should be able to deal with more than 5k indices, and probably better to stick with 3 big nodes rather than add 3 tiny masters on top?

Yea probably in most cases (disks would be my biggest worry here but with much more CS batching nowadays it's less of an issue probably). I think I'd keep it simple then and really just state that the numbers must be added for non-dedicated master nodes even if it's a little pessimistic heap wise if that's ok?

sounds good to me

henningandersen

Thanks Armin, left a few comments.

henningandersen · 2022-05-09T10:16:24Z

docs/reference/how-to/size-your-shards.asciidoc

+As a result, the more fields the mappings in your indices contain, the fewer
+indices and shards will each individual data node be able to hold,


Suggested change

As a result, the more fields the mappings in your indices contain, the fewer

indices and shards will each individual data node be able to hold,

As a result, the more mapped fields per index, the fewer

indices and shards can be held on each individual data node,

++ added a verb to this though

henningandersen · 2022-05-09T10:20:22Z

docs/reference/how-to/size-your-shards.asciidoc

+As a result, the more fields the mappings in your indices contain, the fewer
+indices and shards will each individual data node be able to hold,
+all other things being equal. In addition to the per field overhead you should
+at least have an additional 512Mb of heap available per data node.


On the flip side of that is that running with a few indices with few fields on a 512MB heap with small workload is perfectly possible too. So either the limit need to be lower or a less tight wording than "at least".

henningandersen · 2022-05-09T10:24:43Z

docs/reference/how-to/size-your-shards.asciidoc

+heap memory. The exact amount of heap memory each additional index requires
+depends on various factors. The include but are not limited to the size of its mapping,
+the number of shards per index or whether its mapping is shared with other indices.
+A good rule of thumb is to aim for 3000 indices or fewer per GB of heap on master nodes.


Can we put a few preconditions on this rule of thumb, like less than 1000 fields and with most indices created identically from the same template?

... and not using dynamic mapping updates.

I think these days Beats creates mappings with a lot more than 1000 fields tho. I've seen ~5k. IMO at least some of the advice should be applicable to such indices.

These numbers are taken from Beats indices which do indeed come with 4.x k fields on average. The field count really doesn't matter much here, what matters is mostly that mappings are shared. Added a comment to that effect that also mentions the dynamic mapping updates now.

docs/reference/how-to/size-your-shards.asciidoc

original-brownbear · 2022-05-09T11:21:39Z

Jenkins run elasticsearch-ci/part-2 (unrelated)

DaveCTurner

a few more nits but this looks pretty much good to go IMO, I'll let Henning take a final look too

.idea/inspectionProfiles/Project_Default.xml

docs/reference/how-to/size-your-shards.asciidoc

…ken-docs

Leaf-Lin · 2022-05-10T00:29:11Z

docs/reference/how-to/size-your-shards.asciidoc


 Every mapped field also carries some overhead in terms of memory usage and disk
 space. While the exact amount resource usage of each additional field depends
 on the type of field, an additional 1024B in memory use  for each additional


Suggested change

on the type of field, an additional 1024B in memory use for each additional

on the type of field, an additional 1024B in memory used for each additional

Should this be used instead of use? Also I removed an additional space.

Either is acceptable IMO ("memory use" works here as a nounal phrase)

Let's remove that extra space then.

docs/reference/how-to/size-your-shards.asciidoc

Leaf-Lin

In the example, we talked about having 1000 shards that each contains 4000 fields means a node of 4.5GB heap can hold 1000 shards.

Later on we talked about for non-master node, each node should preserve total_indices_in_cluster/3000 per GB heap.

a good rule of thumb is to aim for 3000 indices or fewer per GB of heap on master nodes
For non-dedicated master nodes, the same rule holds and should be added to the heap requirements of other node roles.

Does that mean we should also mention that one should adjust cluster.max_shards_per_node setting based on the heap available on a node?

max_idling_shards_per_node = (per_node_memory_in_GB/2 - 0.5GB - total_indices_in_cluster/3000) / number_of_fields_per_shard x GB_to_B           ...(1)

total_indices_in_cluster   = number_of_data_nodes x max_idling_shards_per_node                                                                  ...(2)

From (1) and (2), you can derive:

max_idling_shards_per_node = (per_node_memory_in_GB/2 - 0.5GB) x 3000 / (3 x number_of_fields_per_shard/1000 + number_of_data_node)             ...(3)

The largest node (heap size) we recommend for Elasticsearch is 32GB heap or 64GB RAM would be able to hold 7,269 idling shards in 1-data-node-cluster or 4,295 idling shards in a 10-data-node-cluster given 4,000 fields per shard based on this formula. For a 60-data-node cluster, max_idling shards_per_node will be dropped further to 1,313. The more data nodes in the cluster, the few shards per node each node can hold.

The effect of fields count appears to be linear in a 1-node cluster changed to almost negligible as the cluster size grows large (>30 nodes).

Later on on the doc, we refer to adjusting the default cluster.max_shards_per_node value from 1,000 to 1,200 temporarily. Do we want to expand this section to include memory-based max idling shards per node value?

⚠️ , we might want to emphasise idling a bit more in the Each index and shard has overhead section?

docs/reference/how-to/size-your-shards.asciidoc

Leaf-Lin · 2022-05-31T03:12:57Z

Hi, as this is just a doc change, are we able to merge this back into the 8.3.0?

henningandersen

Sorry for the delay here. I added a few more comments.

docs/reference/how-to/size-your-shards.asciidoc

henningandersen · 2022-06-01T13:02:11Z

docs/reference/how-to/size-your-shards.asciidoc

+indices which are created from the same index template and
+<<explicit-mapping,do not use dynamic mapping updates>>.
+
+If your indices are are mostly created from a small number of templates and do


Can we add a paragraph for the non-happy case, i.e., using dynamic fields with in theory completely different mappings all over the place?

What would we want to put in that paragraph? It's hard to put a number on the dynamic case and we already mention that the rule here doesn't apply when using dynamic mapping updates. Does it really make sense to go into more detail here? (particularly when we try to somewhat discourage using dynamic mapping updates to begin with?)

I think of something pretty simple and conservative. Perhaps just 1-2kb per field?

For dynamic mappings, it can make the impact of those very visible. Like you need a 1GB heap if you do not use them but a 30GB heap (or whatever the calculation says) if you use them. That can be a good motivation for a redesign?

Another case could be applications or search use cases using 100s of specific indices with specific mappings. They fall outside the above and we then leave them with no guidance. I think those cases will definitely also result in pretty small numbers anyway, but providing the guidance seems helpful.

I think it might make sense to defer this until #86639 is addressed - IMO it's too complicated to explain here how dynamic mappings might work with the deduplication mechanism.

I would set it up as a worst-case with no deduplication. I agree that anything in between is hard, but providing a no-sharing guidance should be relatively easy? It puts an upper boundary on the no-sharing required heap-size.

but providing a no-sharing guidance should be relatively easy? It puts an upper boundary on the no-sharing required heap-size.

But this totally depends on the size of the mappings used. Also, the effect of not sharing isn't as large as it is made out to be here I think. A compressed beats mapping for example is only ~6k. So for 3k indices without any sharing whatsoever we're only talking about an 18M outright increase in the direct heap requirements for the master node. There's additional effects here from making stats APIs heavier without dedup, sending larger cluster states etc. but I find it very hard to capture these in a heap size number.

We talked offline about this and it seems we could remove the deduplication assumption here, since it is in effect very little extra data on the masters and is more of a cpu concern than memory.

…ken-docs

Co-authored-by: Henning Andersen <[email protected]>

… drop-obviously-broken-docs

…ken-docs

… drop-obviously-broken-docs

henningandersen · 2022-06-09T12:01:06Z

@elasticmachine run elasticsearch-ci/docs

…ken-docs

original-brownbear · 2022-06-09T12:58:35Z

@henningandersen @DaveCTurner how about 588bc87 ?

henningandersen

LGTM, sorry for the back and forth on the deduplication piece.

DaveCTurner

LGTM2

original-brownbear · 2022-06-09T13:30:36Z

Thanks all!

DaveCTurner · 2022-06-16T13:26:18Z

Are we still ok to backport this to 8.3 and 8.2 as per its labels?

This guidance does not apply any longer. The overhead per shard has been significantly reduced in recent versions and removed rule of thumb will be too pessimistic in many if not most cases and might be too optimistic in other specific ones. => Replace guidance with rule of thumb per field count on data nodes and rule of thumb by index count (which is far more relevant nowadays than shards) for master nodes. relates elastic#77466 Co-authored-by: David Turner <[email protected]> Co-authored-by: Henning Andersen <[email protected]>

This guidance does not apply any longer. The overhead per shard has been significantly reduced in recent versions and removed rule of thumb will be too pessimistic in many if not most cases and might be too optimistic in other specific ones. => Replace guidance with rule of thumb per field count on data nodes and rule of thumb by index count (which is far more relevant nowadays than shards) for master nodes. relates #77466 Co-authored-by: David Turner <[email protected]> Co-authored-by: Henning Andersen <[email protected]> Co-authored-by: David Turner <[email protected]> Co-authored-by: Henning Andersen <[email protected]>

original-brownbear added >docs General docs changes :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.3.0 v8.2.1 labels Apr 27, 2022

elasticmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Docs Meta label for docs team labels Apr 27, 2022

original-brownbear force-pushed the drop-obviously-broken-docs branch from 04106a5 to f9dd093 Compare May 9, 2022 09:31

original-brownbear requested review from DaveCTurner and henningandersen May 9, 2022 09:42

DaveCTurner reviewed May 9, 2022

View reviewed changes

henningandersen reviewed May 9, 2022

View reviewed changes

original-brownbear added 2 commits May 9, 2022 12:53

CR comments

a15ef7d

non dedicated master comment

70c1491

original-brownbear requested review from DaveCTurner and henningandersen May 9, 2022 11:10

DaveCTurner reviewed May 9, 2022

View reviewed changes

original-brownbear added 4 commits May 9, 2022 18:58

Merge remote-tracking branch 'elastic/master' into drop-obviously-bro…

5a4917d

…ken-docs

CR: david

869ffc9

meh wl

dcb5067

small b'

9e460a8

Leaf-Lin reviewed May 10, 2022

View reviewed changes

docs/reference/how-to/size-your-shards.asciidoc Outdated Show resolved Hide resolved

Leaf-Lin reviewed May 10, 2022

View reviewed changes

docs/reference/how-to/size-your-shards.asciidoc Outdated Show resolved Hide resolved

Leaf-Lin reviewed May 10, 2022

View reviewed changes

docs/reference/how-to/size-your-shards.asciidoc Outdated Show resolved Hide resolved

Leaf-Lin reviewed May 10, 2022

View reviewed changes

docs/reference/how-to/size-your-shards.asciidoc Outdated Show resolved Hide resolved

henningandersen reviewed Jun 1, 2022

View reviewed changes

original-brownbear and others added 4 commits June 1, 2022 15:13

Merge remote-tracking branch 'elastic/master' into drop-obviously-bro…

0df5ef7

…ken-docs

Update docs/reference/how-to/size-your-shards.asciidoc

d8b6484

Co-authored-by: Henning Andersen <[email protected]>

Merge remote-tracking branch 'origin/drop-obviously-broken-docs' into…

f6d2e87

… drop-obviously-broken-docs

CR: comments

ad6af0f

original-brownbear requested a review from henningandersen June 1, 2022 13:30

original-brownbear added 3 commits June 2, 2022 14:49

Merge remote-tracking branch 'elastic/master' into drop-obviously-bro…

86c22fe

…ken-docs

add require more phrase

43d0bc6

Merge remote-tracking branch 'origin/drop-obviously-broken-docs' into…

38c95b6

… drop-obviously-broken-docs

original-brownbear added 2 commits June 9, 2022 14:49

Merge remote-tracking branch 'elastic/master' into drop-obviously-bro…

4d1a8c1

…ken-docs

remove duplicate mapping details

588bc87

henningandersen approved these changes Jun 9, 2022

View reviewed changes

henningandersen added the v8.3.1 label Jun 9, 2022

DaveCTurner approved these changes Jun 9, 2022

View reviewed changes

original-brownbear merged commit 2a5d65c into elastic:master Jun 9, 2022

original-brownbear deleted the drop-obviously-broken-docs branch June 9, 2022 13:31

Leaf-Lin mentioned this pull request Jun 29, 2022

[8.3][DOCS] Backport of #86223 (Sharding Guidance) #88144

Closed

original-brownbear mentioned this pull request Jun 29, 2022

Remove shards per gb of heap guidance (#86223) #88162

Merged

original-brownbear mentioned this pull request Jun 29, 2022

Remove shards per gb of heap guidance (#86223) #88164

Merged

fatcloud233 mentioned this pull request Aug 17, 2022

[DOC] Guidance about how to size the indices/shards opensearch-project/documentation-website#906

Closed

4 tasks

original-brownbear restored the drop-obviously-broken-docs branch April 18, 2023 20:57

	at least have an additional 512Mb of heap available per data node.
	at least have an additional 512MB of heap available per data node.

	For example, if your cluster contains 12,000 indices and each master node should have
	For example, if your cluster contains 12,000 indices then each dedicated master node should have

		As a result, the more fields the mappings in your indices contain, the fewer
		indices and shards will each individual data node be able to hold,

	on the type of field, an additional 1024B in memory use for each additional
	on the type of field, an additional 1024B in memory used for each additional

Remove shards per gb of heap guidance #86223

Remove shards per gb of heap guidance #86223

Uh oh!

Conversation

original-brownbear commented Apr 27, 2022

Uh oh!

elasticmachine commented Apr 27, 2022

Uh oh!

elasticmachine commented Apr 27, 2022

Uh oh!

DaveCTurner May 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

original-brownbear commented May 9, 2022

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Leaf-Lin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Leaf-Lin commented May 31, 2022

Uh oh!

henningandersen left a comment

DaveCTurner May 9, 2022 •

edited

Loading

Leaf-Lin left a comment •

edited

Loading

original-brownbear Jun 2, 2022 •

edited

Loading