Skip to content

Elasticsearch is fsyncing on transport threads #51904

@Tim-Brooks

Description

@Tim-Brooks

It is currently possible for a cluster state listener to execute a transport fsync.

  1. Currently in TransportShardBulkAction it is possible that a shard operations will trigger a mapping update.
  2. When this happens Elasticsearch registers a ClusterStateObserver.Listener to continue when the mapping is complete.
  3. This listener will eventually attempt to reschedule the write operation.
  4. If the write thread pool cannot accept this operation, the onRejection callback will fail outstanding operations and complete the request (probably trying to notify of the operations that were able to be completed).
  5. Completing a TransportShardBulkAction will attempt to fsync or refresh as necessary after initiating replication.

Here is a transport_worker stack trace. I also think these listeners might be executed on cluster state threads?

ensureSynced:808, Translog (org.elasticsearch.index.translog)
ensureSynced:824, Translog (org.elasticsearch.index.translog)
ensureTranslogSynced:513, InternalEngine (org.elasticsearch.index.engine)
write:2980, IndexShard$5 (org.elasticsearch.index.shard)
processList:108, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
drainAndProcessAndRelease:96, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
put:84, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
sync:3003, IndexShard (org.elasticsearch.index.shard)
run:320, TransportWriteAction$AsyncAfterWriteAction (org.elasticsearch.action.support.replication)
runPostReplicationActions:163, TransportWriteAction$WritePrimaryResult (org.elasticsearch.action.support.replication)
handlePrimaryResult:136, ReplicationOperation (org.elasticsearch.action.support.replication)
accept:-1, 359201671 (org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$3596)
onResponse:63, ActionListener$1 (org.elasticsearch.action)
onResponse:163, ActionListener$4 (org.elasticsearch.action)
completeWith:336, ActionListener (org.elasticsearch.action)
finishRequest:186, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
onRejection:182, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
onRejection:681, ThreadContext$ContextPreservingAbstractRunnable (org.elasticsearch.common.util.concurrent)
execute:90, EsThreadPoolExecutor (org.elasticsearch.common.util.concurrent)
lambda$doRun$0:160, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
accept:-1, 95719050 (org.elasticsearch.action.bulk.TransportShardBulkAction$2$$Lambda$3833)
onResponse:63, ActionListener$1 (org.elasticsearch.action)
lambda$onResponse$0:289, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
run:-1, 1539599321 (org.elasticsearch.action.bulk.TransportShardBulkAction$3$$Lambda$3857)
onResponse:251, ActionListener$5 (org.elasticsearch.action)
onNewClusterState:125, TransportShardBulkAction$1 (org.elasticsearch.action.bulk)
onNewClusterState:311, ClusterStateObserver$ContextPreservingListener (org.elasticsearch.cluster)
waitForNextChange:169, ClusterStateObserver (org.elasticsearch.cluster)
waitForNextChange:120, ClusterStateObserver (org.elasticsearch.cluster)
waitForNextChange:112, ClusterStateObserver (org.elasticsearch.cluster)
lambda$shardOperationOnPrimary$1:122, TransportShardBulkAction (org.elasticsearch.action.bulk)
accept:-1, 1672258490 (org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$3831)
onResponse:277, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
onResponse:273, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
onResponse:282, ActionListener$6 (org.elasticsearch.action)
onResponse:116, MappingUpdatedAction$1 (org.elasticsearch.cluster.action.index)
onResponse:113, MappingUpdatedAction$1 (org.elasticsearch.cluster.action.index)
lambda$executeLocally$0:97, NodeClient (org.elasticsearch.client.node)
accept:-1, 2099146048 (org.elasticsearch.client.node.NodeClient$$Lambda$2772)
onResponse:144, TaskManager$1 (org.elasticsearch.tasks)
onResponse:138, TaskManager$1 (org.elasticsearch.tasks)
handleResponse:54, ActionListenerResponseHandler (org.elasticsearch.action)
handleResponse:1053, TransportService$ContextRestoreResponseHandler (org.elasticsearch.transport)
doRun:220, InboundHandler$1 (org.elasticsearch.transport)
run:37, AbstractRunnable (org.elasticsearch.common.util.concurrent)
execute:196, EsExecutors$DirectExecutorService (org.elasticsearch.common.util.concurrent)
handleResponse:212, InboundHandler (org.elasticsearch.transport)
messageReceived:138, InboundHandler (org.elasticsearch.transport)
inboundMessage:102, InboundHandler (org.elasticsearch.transport)
inboundMessage:664, TcpTransport (org.elasticsearch.transport)
consumeNetworkReads:688, TcpTransport (org.elasticsearch.transport)
consumeReads:276, MockNioTransport$MockTcpReadWriteHandler (org.elasticsearch.transport.nio)
handleReadBytes:228, SocketChannelContext (org.elasticsearch.nio)
read:40, BytesChannelContext (org.elasticsearch.nio)
handleRead:139, EventHandler (org.elasticsearch.nio)
handleRead:151, TestEventHandler (org.elasticsearch.transport.nio)
handleRead:420, NioSelector (org.elasticsearch.nio)
processKey:246, NioSelector (org.elasticsearch.nio)
singleLoop:174, NioSelector (org.elasticsearch.nio)
runLoop:131, NioSelector (org.elasticsearch.nio)
run:-1, 461835914 (org.elasticsearch.nio.NioSelectorGroup$$Lambda$1709)
run:835, Thread (java.lang)

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/CRUDA catch all label for issues around indexing, updating and getting a doc by id. Not search.>bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions