You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Allow master to assign primary shard to node that has shard store locked during shard state fetching (#21656)
PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.
This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has allocation id [{}] but the store can not be opened, treating as no allocation id", shard, nodeShardState.getNode(), finalAllocationId), nodeShardState.storeException());
291
-
allocationId = null;
295
+
if (nodeShardState.storeException() instanceofShardLockObtainFailedException) {
296
+
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has allocation id [{}] but the store can not be opened as it's locked, treating as valid shard", shard, nodeShardState.getNode(), finalAllocationId), nodeShardState.storeException());
297
+
} else {
298
+
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has allocation id [{}] but the store can not be opened, treating as no allocation id", shard, nodeShardState.getNode(), finalAllocationId), nodeShardState.storeException());
"only allow store that can be opened or that throws a ShardLockObtainFailedException while being opened but got a store throwing " + nodeShardState.storeException();
logger.trace("{} candidates for allocation: {}", shard, nodeShardStates.stream().map(s -> s.getNode().getName()).collect(Collectors.joining(", ")));
@@ -412,10 +422,19 @@ static NodeShardsResult buildVersionBasedNodeShardsResult(ShardRouting shard, bo
412
422
logger.trace("[{}] on node [{}] has allocation id [{}]", shard, nodeShardState.getNode(), nodeShardState.allocationId());
413
423
}
414
424
} else {
415
-
finallongfinalVerison = version;
416
-
// when there is an store exception, we disregard the reported version and assign it as no version (same as shard does not exist)
417
-
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has version [{}] but the store can not be opened, treating no version", shard, nodeShardState.getNode(), finalVerison), nodeShardState.storeException());
418
-
version = ShardStateMetaData.NO_VERSION;
425
+
finallongfinalVersion = version;
426
+
if (nodeShardState.storeException() instanceofShardLockObtainFailedException) {
427
+
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has version [{}] but the store can not be opened as it's locked, treating as valid shard", shard, nodeShardState.getNode(), finalVersion), nodeShardState.storeException());
428
+
if (nodeShardState.allocationId() != null) {
429
+
version = Long.MAX_VALUE; // shard was already selected in a 5.x cluster as primary, prefer this shard copy again.
430
+
} else {
431
+
version = 0L; // treat as lowest version so that this shard is the least likely to be selected as primary
432
+
}
433
+
} else {
434
+
// disregard the reported version and assign it as no version (same as shard does not exist)
435
+
logger.trace((Supplier<?>) () -> newParameterizedMessage("[{}] on node [{}] has version [{}] but the store can not be opened, treating no version", shard, nodeShardState.getNode(), finalVersion), nodeShardState.storeException());
0 commit comments