Reuse local node in async shard fetch responses #77991

DaveCTurner · 2021-09-20T08:16:49Z

Reuse local node in async shard fetch responses

We read various objects from the wire that already exist in the cluster
state. The most notable is DiscoveryNode which can consume ~2kB in
heap for each fresh object, but rarely changes, so it's pretty wasteful
to use fresh objects here. There could be thousands (millions?) of
DiscoveryNode objects in flight from various TransportNodesAction
responses.

This branch adds a DiscoveryNode parameter to the response
deserialisation method and makes sure that the worst offenders re-use
the local object rather than creating a fresh one:

TransportNodesListShardStoreMetadata
TransportNodesListGatewayStartedShards

Relates #77266

We read various objects from the wire that already exist in the cluster state. The most notable is `DiscoveryNode` which can consume ~2kB in heap for each fresh object, but rarely changes, so it's pretty wasteful to use fresh objects here. There could be thousands (millions?) of `DiscoveryNode` objects in flight from various `TransportNodesAction` responses. This branch adds a `DiscoveryNode` parameter to the response deserialisation method and makes sure that the worst offenders re-use the local object rather than creating a fresh one: - `TransportNodesListShardStoreMetadata` - `TransportNodesListGatewayStartedShards` Relates elastic#77266

elasticmachine · 2021-09-20T10:31:59Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

LGTM.

DaveCTurner · 2021-09-20T11:09:51Z

Thanks Henning :)

elasticsearchmachine · 2021-09-20T11:10:44Z

💔 Backport failed

Status	Branch	Result
❌	7.x	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 77991

We read various objects from the wire that already exist in the cluster state. The most notable is `DiscoveryNode` which can consume ~2kB in heap for each fresh object, but rarely changes, so it's pretty wasteful to use fresh objects here. There could be thousands (millions?) of `DiscoveryNode` objects in flight from various `TransportNodesAction` responses. This branch adds a `DiscoveryNode` parameter to the response deserialisation method and makes sure that the worst offenders re-use the local object rather than creating a fresh one: - `TransportNodesListShardStoreMetadata` - `TransportNodesListGatewayStartedShards` Relates #77266

howardhuanghua · 2021-11-14T01:44:18Z

@DaveCTurner In our production env, we double checked the optimization about fetcing respose memory consumption.
Before optimization, we could see DiscoveryNode costs 1.7k memory:

After this PR optimization, it only has 128 bytes that contains TransportAddress :

However, the master's heap is still crash, due to huge inflight fetch shard requests. In our case, we have 75 data nodes, and 3 dedicated master nodes, each master node has 4 GB heap, 1.5w shards. After full restarting cluster, master node memory will used up in several seconds. We dump the memory and found netty inflight sending request used lots of heap:

Each WriteOperation should be single shard request to specific node (16k buffer size per each):

From Netty4MessageChannelHandler class we could see a queuedWrites, messages are flushed asynchronously:

elasticsearch/modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/Netty4MessageChannelHandler.java

Line 42 in 025dbdc

private final Queue<WriteOperation> queuedWrites = new ArrayDeque<>();

So besides cutting fetch shard response, we also need to handle massive shard sending requests.

DaveCTurner · 2021-11-14T09:41:20Z

That sounds like a separate problem @howardhuanghua, although it's related. Would you open a new issue about it?

1.5w shards

I think that's a typo, but this is important - how many shards are there in this cluster?

howardhuanghua · 2021-11-14T15:42:07Z

@DaveCTurner Sorry about the typo. 15000 shards total in cluster. I will open another issue.

howardhuanghua · 2021-11-14T17:03:39Z

Opened a new issue #80694.

DaveCTurner · 2021-11-17T09:47:15Z

I think that's a typo

TIL that you used "w" to abbreviate "wan", i.e. 万, meaning 10,000. I didn't know that was a thing, but I do now 😄

howardhuanghua · 2021-11-17T09:50:46Z

😄 Yes, you are right. It's a Chinese style.

DaveCTurner added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Sep 20, 2021

elasticsearchmachine added the v8.0.0 label Sep 20, 2021

DaveCTurner force-pushed the 2021-09-20-reuse-local-disco-node-in-async-shard-fetch branch from 205073d to c35942b Compare September 20, 2021 09:36

DaveCTurner requested review from henningandersen and original-brownbear September 20, 2021 10:31

DaveCTurner marked this pull request as ready for review September 20, 2021 10:31

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 20, 2021

DaveCTurner mentioned this pull request Sep 20, 2021

Purge unused node info and transport addr info after fetching shard stats. #77266

Closed

henningandersen approved these changes Sep 20, 2021

View reviewed changes

DaveCTurner added v7.16.0 auto-backport-and-merge labels Sep 20, 2021

DaveCTurner merged commit 5486783 into elastic:master Sep 20, 2021

DaveCTurner deleted the 2021-09-20-reuse-local-disco-node-in-async-shard-fetch branch September 20, 2021 11:09

DaveCTurner mentioned this pull request Sep 20, 2021

WIP draft for ClusterStateReusingStreamInput #77982

Closed

jakelandis added v8.0.0-beta1 and removed v8.0.0 labels Oct 27, 2021

howardhuanghua mentioned this pull request Nov 14, 2021

Massive async shard fetch requests consume lots of heap memories on master node. #80694

Open

danhermann added the >enhancement label Dec 3, 2021

DaveCTurner mentioned this pull request Feb 16, 2022

Remove redundant response of empty result in AsyncShardFetch to avoid OOM issue #84010

Closed

fudongyingluck mentioned this pull request Mar 7, 2023

speed up read DiscoveryNode from network by skip #94352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reuse local node in async shard fetch responses #77991

Reuse local node in async shard fetch responses #77991

Uh oh!

DaveCTurner commented Sep 20, 2021 •

edited

Loading

Uh oh!

elasticmachine commented Sep 20, 2021

Uh oh!

henningandersen left a comment

Uh oh!

DaveCTurner commented Sep 20, 2021

Uh oh!

elasticsearchmachine commented Sep 20, 2021

Uh oh!

howardhuanghua commented Nov 14, 2021

Uh oh!

DaveCTurner commented Nov 14, 2021

Uh oh!

howardhuanghua commented Nov 14, 2021 •

edited

Loading

Uh oh!

howardhuanghua commented Nov 14, 2021

Uh oh!

DaveCTurner commented Nov 17, 2021

Uh oh!

howardhuanghua commented Nov 17, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Reuse local node in async shard fetch responses #77991

Reuse local node in async shard fetch responses #77991

Uh oh!

Conversation

DaveCTurner commented Sep 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Sep 20, 2021

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Sep 20, 2021

Uh oh!

elasticsearchmachine commented Sep 20, 2021

💔 Backport failed

Uh oh!

howardhuanghua commented Nov 14, 2021

Uh oh!

DaveCTurner commented Nov 14, 2021

Uh oh!

howardhuanghua commented Nov 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

howardhuanghua commented Nov 14, 2021

Uh oh!

DaveCTurner commented Nov 17, 2021

Uh oh!

howardhuanghua commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

DaveCTurner commented Sep 20, 2021 •

edited

Loading

howardhuanghua commented Nov 14, 2021 •

edited

Loading

howardhuanghua commented Nov 17, 2021 •

edited

Loading