-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
When a primary is relocating from node_1 to node_2, there can be a short time where the old primary is removed from the node already (closed, not deleted) but the new primary is still in POST_RECOVERY. In this state indexing requests might be sent back and forth between node_1 and node_2 endlessly.
Course of events:
-
primary (
[index][0]) relocates fromnode_1tonode_2 -
node_2is done recovering, moves its shard toIndexShardState.POST_RECOVERYand sends a message to master that the shard isShardRoutingState.STARTEDCluster state 1: node_1: [index][0] RELOCATING (ShardRoutingState), (STARTED from IndexShardState perspective on node_1) node_2: [index][0] INITIALIZING (ShardRoutingState), (at this point already POST_RECOVERY from IndexShardState perspective on node_2) -
master receives shard started and updates cluster state to:
Cluster state 2: node_1: [index][0] no shard node_2: [index][0] STARTED (ShardRoutingState), (at this point still in POST_RECOVERY from IndexShardState perspective on node_2)master sends this to
node_1andnode_2 -
node_1receives the new cluster state and removes its shard because it is not allocated onnode_1anymore -
index a document
At this point node_1 is already on cluster state 2 and does not have the shard anymore so it forwards the request to node_2. But node_2 is behind with cluster state processing, is still on cluster state 1 and therefore has the shard in IndexShardState.POST_RECOVERY and thinks node_1 has the primary. So it will send the request back to node_1. This goes on until either node_2 finally catches up and processes cluster state 2 or both nodes OOM.
I will make a pull request with a test shortly