-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow master to assign primary shard to node that has shard store locked during shard state fetching #21656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow master to assign primary shard to node that has shard store locked during shard state fetching #21656
Conversation
dakrone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I wanted to make sure I understood correctly - if the shard that has the lock exception is the only valid copy, so the allocator decides to allocate the primary to this (currently) locked shard. When the node receives the cluster state update that it must allocate the primary on itself, it will try to obtain the shard lock for 5 secs. If it fails to obtain the lock within 5 secs, the failure is sent to master, which will try to reallocate to the same node again. It will do this for up to 5 tries (by default) due to the MaxRetryAllocationDecider. So the node must release the lock on the shard within 5 tries, each try attempting for 5 secs. Is this understanding correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question - why not let Store.tryOpenIndex throw ShardLockObtainFailedException and catch it here directly, logging it and not storing it as a "store" exception? The first part will make things simpler imo. I'm fine with not going with the second part and doing it like you propose but wanted to better understand your reasoning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can directly throw ShardLockObtainFailedException. I wasn't sure why you wrapped that exception in the first place so I left as is.
To your second point, I left the decision on how to treat the ShardLockObtainFailedException to PrimaryShardAllocator as it has more context available to make the final decision where to allocate the shard. For example, it prioritizes another valid shard copy that has not thrown the exception. Also it allows the shard store action /_shard_stores to properly expose the exception as it reuses the same endpoint as the primary shard allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can directly throw ShardLockObtainFailedException. I wasn't sure why you wrapped that exception in the first place so I left as is.
I think that I just didn't want to extend the scope of the change and I didn't have a reason to not just throw an IOException. I think the unwrapping here merits this?
For example, it prioritizes another valid shard copy that has not thrown the exception. Also it allows the shard store action /_shard_stores to properly expose the exception as it reuses the same endpoint as the primary shard allocator.
Fair enough. Thanks.
|
@abeyad correct (I tried it to confirm). We will have 5 iterations where 5 seconds are taken to obtain the shard lock while shard fetching and then 5 seconds to obtain the shard lock while trying to allocate the shard on the node, so 5 * 5 seconds for shard fetching + 4 * 5 seconds for shard allocation attempts = 45 seconds :-) Test code Output: |
|
@ywelsch thanks for confirming |
|
Playing around with the above test case I noticed that |
Wouldn't that make this non-backport-able, since 5.0 won't have the serialization logic for this exception? |
correct 😞 We need to rethink this. Any suggestions? |
I think we have 3 options
Given the time frame of this and the aim to have it in 5.1, I tend towards option 2. |
Today it's not possible to add exceptions to the serialization layer without breaking BWC. This commit adds the ability to specify the Version an exception was added that allows to fall back not NotSerializableExceptionWrapper if the expection is not present in the streams version. Relates to elastic#21656
Today it's not possible to add exceptions to the serialization layer without breaking BWC. This commit adds the ability to specify the Version an exception was added that allows to fall back not NotSerializableExceptionWrapper if the exception is not present in the streams version. Relates to #21656
Today it's not possible to add exceptions to the serialization layer without breaking BWC. This commit adds the ability to specify the Version an exception was added that allows to fall back not NotSerializableExceptionWrapper if the exception is not present in the streams version. Relates to #21656
…ked during shard state fetching
d8a6b91 to
c4a123b
Compare
|
Can I get another review on this? |
Today it's not possible to add exceptions to the serialization layer without breaking BWC. This commit adds the ability to specify the Version an exception was added that allows to fall back not NotSerializableExceptionWrapper if the exception is not present in the streams version. Relates to #21656
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM (left some nits). I do think we need to beef up the test a bit.
| if (inSyncAllocationIds.contains(allocationId)) { | ||
| if (nodeShardState.primary()) { | ||
| // put shards that were primary before and that didn't throw a ShardLockObtainFailedException first | ||
| if (nodeShardState.primary() && nodeShardState.storeException() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assert that the store exception is what we expect it to be?
| } else if (matchAnyShard) { | ||
| if (nodeShardState.primary()) { | ||
| // put shards that were primary before and that didn't throw a ShardLockObtainFailedException first | ||
| if (nodeShardState.primary() && nodeShardState.storeException() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same request for assertion.
| if (nodeShardState.storeException() instanceof ShardLockObtainFailedException) { | ||
| logger.trace((Supplier<?>) () -> new ParameterizedMessage("[{}] on node [{}] has version [{}] but the store can not be opened as it's locked, treating as valid shard", shard, nodeShardState.getNode(), finalVersion), nodeShardState.storeException()); | ||
| if (nodeShardState.allocationId() != null) { | ||
| version = Long.MAX_VALUE; // shard was already selected in a 5.x cluster as primary, prefer this shard copy again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you run into this being a problem? how can we open an index that was created before 5.0.0 and never had insync replicas but does have allocationId? the only thing I can think of is a node network issue during shard initialization. I'm wondering if we need to optimize for this and no keep this code simple (i.e., demote shards with a lock exception)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We special-case this a few lines up as well (but it's not easy to do code reuse across those lines). For symmetry reasons I have kept it as is. The code is documented as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
|
|
||
| /** | ||
| * Tests that when the node returns a ShardLockObtainFailedException, it will be considered as a valid shard copy | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add tests for cases with more shards? for example the case where we "prefer" other shard copies with this exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've added more tests
| } | ||
| } | ||
|
|
||
| List<NodeGatewayStartedShards> nodeShardStates = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
| if (matchAnyShard) { | ||
| // prefer shards with matching allocation ids | ||
| Comparator<NodeGatewayStartedShards> matchingAllocationsFirst = Comparator.comparing( | ||
| (NodeGatewayStartedShards state) -> inSyncAllocationIds.contains(state.allocationId())).reversed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fancy pants.
| if (nodeShardState.storeException() instanceof ShardLockObtainFailedException) { | ||
| logger.trace((Supplier<?>) () -> new ParameterizedMessage("[{}] on node [{}] has version [{}] but the store can not be opened as it's locked, treating as valid shard", shard, nodeShardState.getNode(), finalVersion), nodeShardState.storeException()); | ||
| if (nodeShardState.allocationId() != null) { | ||
| version = Long.MAX_VALUE; // shard was already selected in a 5.x cluster as primary, prefer this shard copy again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
bleskes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @ywelsch
…ked during shard state fetching (#21656) PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures. This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
…ked during shard state fetching (#21656) PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a ShardLockObtainFailedException. PrimaryShardAllocator that decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures. This commit makes it so that PrimaryShardAllocator treats the locked shard as a possible allocation target (although with the least priority).
PR #19416 added a safety mechanism to shard state fetching to only access the store when the shard lock can be acquired. This can lead to the following situation however where a shard has not fully shut down yet while the shard fetching is going on, resulting in a
ShardLockObtainFailedException.PrimaryShardAllocatorthat decides where to allocate primary shards sees this exception and treats the shard as unusable. If this is the only shard copy in the cluster, the cluster stays red and a new shard fetching cycle will not be triggered as shard state fetching treats exceptions while opening the store as permanent failures.This PR makes it so that
PrimaryShardAllocatortreats the locked shard as a possible allocation target (although with the least priority).