Batch async fetch shards data to reduce memory consumption. #81081

howardhuanghua · 2021-11-27T15:09:36Z

This commit is going to fix #80694.

Queue shard async fetch requests and listeners.
Flush all the queued reqeusts after we collected all the node level async fetch reqeusts.
Split fetch responses to single node-to-shard and call the cached listeners after receiving node level fetch reqeusts.

Async shard fetch requests before/after optimization:

DaveCTurner

This seems like the right sort of idea. I left some comments inline. I think we should also do this for replica allocations too.

DaveCTurner · 2021-11-29T13:34:26Z

server/src/main/java/org/elasticsearch/gateway/AsyncShardFetch.java

     */
    public interface Lister<NodesResponse extends BaseNodesResponse<NodeResponse>, NodeResponse extends BaseNodeResponse> {
        void list(ShardId shardId, @Nullable String customDataPath, DiscoveryNode[] nodes, ActionListener<NodesResponse> listener);
+        void flush();


Rather than introducing this method to the lister (and the corresponding flag passed in to fetchData) could we have the allocator directly indicate the end of an allocation round which triggers the flush.

DaveCTurner · 2021-11-29T13:35:37Z

server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java

+                        for (ShardId shardId : requestMap.keySet()) {
+                            ShardRequestInfo shardRequest = requestMap.get(shardId);
+                            shards.put(shardRequest.shardId(), shardRequest.getCustomDataPath());
+                            if (node.getVersion().before(Version.V_7_16_0)) {


The version in master is now 8.1.0; it's unlikely we'll backport this to an earlier version.

DaveCTurner · 2021-11-29T13:38:25Z

server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java

+                            };
+
+                            client.executeLocally(
+                                TransportNodesListGatewayStartedShards.TYPE,


I'm undecided about re-using the same action type for both kinds of request here. I think it'd be cleaner to introduce a new one (and to name it something better than internal:gateway/local/started_shards) given how big a difference in behaviour we are making.

Hi @DaveCTurner , if we introduce a new action, then we need to refactor some logics in GatewayAllocator, like the follow structures, it seems that would a big change for the high level allocators. How do you think so?

elasticsearch/server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java

Lines 57 to 60 in 2629c32

private final ConcurrentMap<ShardId, AsyncShardFetch<NodeGatewayStartedShards>> asyncFetchStarted = ConcurrentCollections

.newConcurrentMap();

private final ConcurrentMap<ShardId, AsyncShardFetch<NodeStoreFilesMetadata>> asyncFetchStore = ConcurrentCollections

.newConcurrentMap();

I'm not sure this is true, I think we could keep pretty much the same interface from the point of view of GatewayAllocator. It should be possible to implement a batching Lister which reworks the batched responses into a BaseNodesResponse<NodeGatewayStartedShards>.

DaveCTurner · 2021-11-29T13:40:31Z

server/src/main/java/org/elasticsearch/gateway/TransportNodesListGatewayStartedShards.java

-    protected NodeGatewayStartedShards nodeOperation(NodeRequest request, Task task) {
+    protected NodeGroupedGatewayStartedShards nodeOperation(NodeRequest request, Task task) {
+        NodeGroupedGatewayStartedShards groupedStartedShards = new NodeGroupedGatewayStartedShards(clusterService.localNode());
+        for (Map.Entry<ShardId, String> entry : request.getShards().entrySet()) {


When sending these requests per-shard we execute them in parallel across the FETCH_SHARD_STARTED threadpool. I think we should continue to parallelise them at the shard level like that.

howardhuanghua · 2021-11-30T01:56:43Z

Thanks for the suggestion. I am going to complete the optimization.

elasticmachine · 2021-11-30T13:24:24Z

Pinging @elastic/es-search (Team:Search)

elasticmachine · 2021-11-30T13:32:49Z

Pinging @elastic/es-distributed (Team:Distributed)

- No need to use an `AsyncShardFetch` here, there is no caching - Response may be very large, introduce chunking - Fan-out may be very large, introduce throttling - Processing time may be nontrivial, introduce cancellability - Eliminate many unnecessary intermediate data structures - Do shard-level response processing more eagerly - Determine allocation from `RoutingTable` not `RoutingNodes` - Add tests Relates #81081

elasticsearchmachine · 2025-01-30T16:57:20Z

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

elasticsearchmachine · 2025-01-30T16:57:21Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Initial draft

689467f

elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Nov 27, 2021

howardhuanghua marked this pull request as ready for review November 28, 2021 01:52

howardhuanghua closed this Nov 28, 2021

howardhuanghua reopened this Nov 28, 2021

howardhuanghua changed the title ~~Group primary shard async fetch requests by node to reduce memory consumption.~~ [Draft]Group primary shard async fetch requests by node to reduce memory consumption. Nov 28, 2021

howardhuanghua mentioned this pull request Nov 28, 2021

Massive async shard fetch requests consume lots of heap memories on master node. #80694

Open

update draft

8cf0166

DaveCTurner reviewed Nov 29, 2021

View reviewed changes

pgomulka added the :Search/Search Search-related issues that do not fall into other categories label Nov 30, 2021

elasticmachine added the Team:Search Meta label for search team label Nov 30, 2021

ywelsch added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Search/Search Search-related issues that do not fall into other categories labels Nov 30, 2021

elasticmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed Team:Search Meta label for search team labels Nov 30, 2021

howardhuanghua added 8 commits December 6, 2021 20:58

Merge remote-tracking branch 'origin' into async_fetch_request

eee910e

Introduce new batch action for shard fetch

6f8af21

fix compile issue

c63da96

revert shard store action

be1357b

remove unused func

00ec7fe

fix spotless issue

c64755d

add new action registion instance

a5552c2

Merge remote-tracking branch 'origin' into async_fetch_request

92c19ef

howardhuanghua changed the title ~~[Draft]Group primary shard async fetch requests by node to reduce memory consumption.~~ Group primary shard async fetch requests by node to reduce memory consumption. Dec 7, 2021

howardhuanghua changed the title ~~Group primary shard async fetch requests by node to reduce memory consumption.~~ Batch async fetch shards data to reduce memory consumption. Dec 7, 2021

howardhuanghua requested a review from DaveCTurner December 7, 2021 06:24

DaveCTurner mentioned this pull request Mar 14, 2023

Simplify IndicesShardStoresAction #94507

Merged

gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023

pugnascotia added v8.10.0 and removed v8.9.0 labels Jun 22, 2023

quux00 added v8.11.0 and removed v8.10.0 labels Aug 16, 2023

mattc58 added v8.12.0 and removed v8.11.0 labels Oct 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

elasticsearchmachine added v9.1.0 Team:Distributed Coordination Meta label for Distributed Coordination team and removed v9.0.0 labels Jan 30, 2025

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

elasticsearchmachine added v9.3.0 and removed v9.2.0 labels Oct 2, 2025

	private final ConcurrentMap<ShardId, AsyncShardFetch<NodeGatewayStartedShards>> asyncFetchStarted = ConcurrentCollections
	.newConcurrentMap();
	private final ConcurrentMap<ShardId, AsyncShardFetch<NodeStoreFilesMetadata>> asyncFetchStore = ConcurrentCollections
	.newConcurrentMap();

Batch async fetch shards data to reduce memory consumption. #81081

Are you sure you want to change the base?

Batch async fetch shards data to reduce memory consumption. #81081

Uh oh!

Conversation

howardhuanghua commented Nov 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

howardhuanghua Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 30, 2021

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

howardhuanghua commented Nov 30, 2021

Uh oh!

elasticmachine commented Nov 30, 2021

Uh oh!

elasticmachine commented Nov 30, 2021

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

elasticsearchmachine commented Jan 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

howardhuanghua commented Nov 27, 2021 •

edited

Loading