-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/NetworkHttp and internode communication implementationsHttp and internode communication implementations>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Description
Elasticsearch version (bin/elasticsearch --version): 7.1
Description of the problem including expected versus actual behavior:
This cluster had roughly ~ 50K shards and multiple nodes were disconnected from master and during hot_threads dump observed that master#updateTask thread was spending lot of time sending these TransportNodesListGatewayStartedShards requests. These are async calls and response handling is done in separate threadpool but transport sendRequest seems to be still executing in master#updateTask. The sending logic could also move to separate threadpool.
Though with this #42855 this wont happen in the context of JoinExecutor but when reroute task is scheduled, it will happen.
Provide logs (if relevant):
80.1% (8s out of 10s) cpu usage by thread 'elasticsearch[7b5db5][masterService#updateTask][T#1]'
7/10 snapshots sharing following 54 elements
sun.nio.ch.EPollArrayWrapper.interrupt(Native Method)
sun.nio.ch.EPollArrayWrapper.interrupt(EPollArrayWrapper.java:317)
sun.nio.ch.EPollSelectorImpl.wakeup(EPollSelectorImpl.java:207)
io.netty.channel.nio.NioEventLoop.wakeup(NioEventLoop.java:719)
io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:799)
io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1013)
io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:825)
io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1066)
io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:309)
org.elasticsearch.transport.netty4.Netty4TcpChannel.sendMessage(Netty4TcpChannel.java:139)
org.elasticsearch.transport.OutboundHandler.internalSendMessage(OutboundHandler.java:80)
org.elasticsearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:70)
org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:680)
org.elasticsearch.transport.TcpTransport.sendRequestToChannel(TcpTransport.java:669)
org.elasticsearch.transport.TcpTransport.access$300(TcpTransport.java:100)
org.elasticsearch.transport.TcpTransport$NodeChannels.sendRequest(TcpTransport.java:272)
org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:633)
org.elasticsearch.transport.TransportService$$Lambda$1519/1694636980.sendRequest(Unknown Source)
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:543)
org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:531)
org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.start(TransportNodesAction.java:182)
org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:82)
org.elasticsearch.action.support.nodes.TransportNodesAction.doExecute(TransportNodesAction.java:51)
org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:145)
com.amazon.opendistro.elasticsearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:77)
org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:143)
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:121)
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:64)
org.elasticsearch.gateway.TransportNodesListGatewayStartedShards.list(TransportNodesListGatewayStartedShards.java:91)
org.elasticsearch.gateway.AsyncShardFetch.asyncFetch(AsyncShardFetch.java:283)
org.elasticsearch.gateway.AsyncShardFetch.fetchData(AsyncShardFetch.java:126)
org.elasticsearch.gateway.GatewayAllocator$InternalPrimaryShardAllocator.fetchData(GatewayAllocator.java:159)
org.elasticsearch.gateway.PrimaryShardAllocator.makeAllocationDecision(PrimaryShardAllocator.java:86)
org.elasticsearch.gateway.BaseGatewayShardAllocator.allocateUnassigned(BaseGatewayShardAllocator.java:59)
org.elasticsearch.gateway.GatewayAllocator.innerAllocatedUnassigned(GatewayAllocator.java:114)
org.elasticsearch.gateway.GatewayAllocator.allocateUnassigned(GatewayAllocator.java:104)
org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:410)
org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:378)
org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:361)
org.elasticsearch.cluster.coordination.JoinTaskExecutor.execute(JoinTaskExecutor.java:155)
org.elasticsearch.cluster.coordination.JoinHelper$1.execute(JoinHelper.java:118)
org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:687)
org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:310)
org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:210)
org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142)
org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:690)
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/NetworkHttp and internode communication implementationsHttp and internode communication implementations>enhancementTeam:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.