Elasticsearch is fsyncing on transport threads

It is currently possible for a cluster state listener to execute a transport fsync.

1. Currently in `TransportShardBulkAction` it is possible that a shard operations will trigger a mapping update. 
2. When this happens Elasticsearch [registers](https://github.com/elastic/elasticsearch/blob/3ad8aa6d46580229823d298e9ce68ba3aaadc3d6/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L125) a `ClusterStateObserver.Listener` to continue when the mapping is complete. 
3. This listener will eventually attempt to [reschedule](https://github.com/elastic/elasticsearch/blob/3ad8aa6d46580229823d298e9ce68ba3aaadc3d6/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L160) the write operation. 
4. If the write thread pool cannot accept this operation, the `onRejection` callback will fail outstanding operations and [complete](https://github.com/elastic/elasticsearch/blob/3ad8aa6d46580229823d298e9ce68ba3aaadc3d6/server/src/main/java/org/elasticsearch/action/bulk/TransportShardBulkAction.java#L182) the request (probably trying to notify of the operations that were able to be completed).
5. Completing a `TransportShardBulkAction` will attempt to fsync or refresh as [necessary](https://github.com/elastic/elasticsearch/blob/062f9f03bfcdf8e3f57eb6a30d49fa5ec89c777f/server/src/main/java/org/elasticsearch/action/support/replication/TransportWriteAction.java#L145) after initiating replication. 


Here is a transport_worker stack trace. I also think these listeners might be executed on cluster state threads?

```
ensureSynced:808, Translog (org.elasticsearch.index.translog)
ensureSynced:824, Translog (org.elasticsearch.index.translog)
ensureTranslogSynced:513, InternalEngine (org.elasticsearch.index.engine)
write:2980, IndexShard$5 (org.elasticsearch.index.shard)
processList:108, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
drainAndProcessAndRelease:96, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
put:84, AsyncIOProcessor (org.elasticsearch.common.util.concurrent)
sync:3003, IndexShard (org.elasticsearch.index.shard)
run:320, TransportWriteAction$AsyncAfterWriteAction (org.elasticsearch.action.support.replication)
runPostReplicationActions:163, TransportWriteAction$WritePrimaryResult (org.elasticsearch.action.support.replication)
handlePrimaryResult:136, ReplicationOperation (org.elasticsearch.action.support.replication)
accept:-1, 359201671 (org.elasticsearch.action.support.replication.ReplicationOperation$$Lambda$3596)
onResponse:63, ActionListener$1 (org.elasticsearch.action)
onResponse:163, ActionListener$4 (org.elasticsearch.action)
completeWith:336, ActionListener (org.elasticsearch.action)
finishRequest:186, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
onRejection:182, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
onRejection:681, ThreadContext$ContextPreservingAbstractRunnable (org.elasticsearch.common.util.concurrent)
execute:90, EsThreadPoolExecutor (org.elasticsearch.common.util.concurrent)
lambda$doRun$0:160, TransportShardBulkAction$2 (org.elasticsearch.action.bulk)
accept:-1, 95719050 (org.elasticsearch.action.bulk.TransportShardBulkAction$2$$Lambda$3833)
onResponse:63, ActionListener$1 (org.elasticsearch.action)
lambda$onResponse$0:289, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
run:-1, 1539599321 (org.elasticsearch.action.bulk.TransportShardBulkAction$3$$Lambda$3857)
onResponse:251, ActionListener$5 (org.elasticsearch.action)
onNewClusterState:125, TransportShardBulkAction$1 (org.elasticsearch.action.bulk)
onNewClusterState:311, ClusterStateObserver$ContextPreservingListener (org.elasticsearch.cluster)
waitForNextChange:169, ClusterStateObserver (org.elasticsearch.cluster)
waitForNextChange:120, ClusterStateObserver (org.elasticsearch.cluster)
waitForNextChange:112, ClusterStateObserver (org.elasticsearch.cluster)
lambda$shardOperationOnPrimary$1:122, TransportShardBulkAction (org.elasticsearch.action.bulk)
accept:-1, 1672258490 (org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$3831)
onResponse:277, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
onResponse:273, TransportShardBulkAction$3 (org.elasticsearch.action.bulk)
onResponse:282, ActionListener$6 (org.elasticsearch.action)
onResponse:116, MappingUpdatedAction$1 (org.elasticsearch.cluster.action.index)
onResponse:113, MappingUpdatedAction$1 (org.elasticsearch.cluster.action.index)
lambda$executeLocally$0:97, NodeClient (org.elasticsearch.client.node)
accept:-1, 2099146048 (org.elasticsearch.client.node.NodeClient$$Lambda$2772)
onResponse:144, TaskManager$1 (org.elasticsearch.tasks)
onResponse:138, TaskManager$1 (org.elasticsearch.tasks)
handleResponse:54, ActionListenerResponseHandler (org.elasticsearch.action)
handleResponse:1053, TransportService$ContextRestoreResponseHandler (org.elasticsearch.transport)
doRun:220, InboundHandler$1 (org.elasticsearch.transport)
run:37, AbstractRunnable (org.elasticsearch.common.util.concurrent)
execute:196, EsExecutors$DirectExecutorService (org.elasticsearch.common.util.concurrent)
handleResponse:212, InboundHandler (org.elasticsearch.transport)
messageReceived:138, InboundHandler (org.elasticsearch.transport)
inboundMessage:102, InboundHandler (org.elasticsearch.transport)
inboundMessage:664, TcpTransport (org.elasticsearch.transport)
consumeNetworkReads:688, TcpTransport (org.elasticsearch.transport)
consumeReads:276, MockNioTransport$MockTcpReadWriteHandler (org.elasticsearch.transport.nio)
handleReadBytes:228, SocketChannelContext (org.elasticsearch.nio)
read:40, BytesChannelContext (org.elasticsearch.nio)
handleRead:139, EventHandler (org.elasticsearch.nio)
handleRead:151, TestEventHandler (org.elasticsearch.transport.nio)
handleRead:420, NioSelector (org.elasticsearch.nio)
processKey:246, NioSelector (org.elasticsearch.nio)
singleLoop:174, NioSelector (org.elasticsearch.nio)
runLoop:131, NioSelector (org.elasticsearch.nio)
run:-1, 461835914 (org.elasticsearch.nio.NioSelectorGroup$$Lambda$1709)
run:835, Thread (java.lang)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Elasticsearch is fsyncing on transport threads #51904

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Elasticsearch is fsyncing on transport threads #51904

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions