-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Batch async fetch shards data to reduce memory consumption. #81081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the right sort of idea. I left some comments inline. I think we should also do this for replica allocations too.
| */ | ||
| public interface Lister<NodesResponse extends BaseNodesResponse<NodeResponse>, NodeResponse extends BaseNodeResponse> { | ||
| void list(ShardId shardId, @Nullable String customDataPath, DiscoveryNode[] nodes, ActionListener<NodesResponse> listener); | ||
| void flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than introducing this method to the lister (and the corresponding flag passed in to fetchData) could we have the allocator directly indicate the end of an allocation round which triggers the flush.
| for (ShardId shardId : requestMap.keySet()) { | ||
| ShardRequestInfo shardRequest = requestMap.get(shardId); | ||
| shards.put(shardRequest.shardId(), shardRequest.getCustomDataPath()); | ||
| if (node.getVersion().before(Version.V_7_16_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version in master is now 8.1.0; it's unlikely we'll backport this to an earlier version.
| }; | ||
|
|
||
| client.executeLocally( | ||
| TransportNodesListGatewayStartedShards.TYPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm undecided about re-using the same action type for both kinds of request here. I think it'd be cleaner to introduce a new one (and to name it something better than internal:gateway/local/started_shards) given how big a difference in behaviour we are making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @DaveCTurner , if we introduce a new action, then we need to refactor some logics in GatewayAllocator, like the follow structures, it seems that would a big change for the high level allocators. How do you think so?
elasticsearch/server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java
Lines 57 to 60 in 2629c32
| private final ConcurrentMap<ShardId, AsyncShardFetch<NodeGatewayStartedShards>> asyncFetchStarted = ConcurrentCollections | |
| .newConcurrentMap(); | |
| private final ConcurrentMap<ShardId, AsyncShardFetch<NodeStoreFilesMetadata>> asyncFetchStore = ConcurrentCollections | |
| .newConcurrentMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is true, I think we could keep pretty much the same interface from the point of view of GatewayAllocator. It should be possible to implement a batching Lister which reworks the batched responses into a BaseNodesResponse<NodeGatewayStartedShards>.
| protected NodeGatewayStartedShards nodeOperation(NodeRequest request, Task task) { | ||
| protected NodeGroupedGatewayStartedShards nodeOperation(NodeRequest request, Task task) { | ||
| NodeGroupedGatewayStartedShards groupedStartedShards = new NodeGroupedGatewayStartedShards(clusterService.localNode()); | ||
| for (Map.Entry<ShardId, String> entry : request.getShards().entrySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When sending these requests per-shard we execute them in parallel across the FETCH_SHARD_STARTED threadpool. I think we should continue to parallelise them at the shard level like that.
|
Thanks for the suggestion. I am going to complete the optimization. |
|
Pinging @elastic/es-search (Team:Search) |
|
Pinging @elastic/es-distributed (Team:Distributed) |
- No need to use an `AsyncShardFetch` here, there is no caching - Response may be very large, introduce chunking - Fan-out may be very large, introduce throttling - Processing time may be nontrivial, introduce cancellability - Eliminate many unnecessary intermediate data structures - Do shard-level response processing more eagerly - Determine allocation from `RoutingTable` not `RoutingNodes` - Add tests Relates #81081
|
Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete)) |
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
This commit is going to fix #80694.
node-to-shardand call the cached listeners after receiving node level fetch reqeusts.Async shard fetch requests before/after optimization:
