Skip to content

Conversation

@gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Jun 8, 2021

This PR modifies the Node Shutdown Status API to include information about the number of shards left to migrate off of the node in question, as well as checking if shard migration is stalled.

"Stalled" is defined as no shards currently relocating off the node, and at least one shard on the node which cannot move per current allocation deciders.

Relates #70338

@gwbrown gwbrown added >non-issue v8.0.0 :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown v7.14.0 labels Jun 8, 2021
@gwbrown
Copy link
Contributor Author

gwbrown commented Jun 10, 2021

@dakrone I'm going to request your review on this because the production code and unit tests are reviewable, but I'm still intending to add integration tests - but I expect those to be pretty straightforward, and all the behavior should already be covered by the unit tests.

@gwbrown gwbrown requested a review from dakrone June 10, 2021 00:29
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment on this, I think we need to handle INITIALIZING shards as part of the count also, what do you think?


public ShutdownShardMigrationStatus() {
this.status = SingleNodeShutdownMetadata.Status.COMPLETE;
public ShutdownShardMigrationStatus(SingleNodeShutdownMetadata.Status status, long shardsRemaining) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we should just pull SingleNodeShutdownMetadata.Status into its own top-level class named something like ShutdownComponentStatus or something, since it's being used quite a few places (not in this PR though)

) {
// Only REMOVE-type shutdowns will try to move shards, so RESTART-type shutdowns should immediately complete
if (SingleNodeShutdownMetadata.Type.RESTART.equals(shutdownType)) {
return new ShutdownShardMigrationStatus(SingleNodeShutdownMetadata.Status.COMPLETE, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well include a "reason" string here for users, it'll be helpful for users that might not be aware that restarts don't migrate shards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks!

Comment on lines 169 to 171
int currentShardsOnNode = currentState.getRoutingNodes().node(nodeId).numberOfShardsWithState(ShardRoutingState.STARTED);
int currentlyRelocatingShards = currentState.getRoutingNodes().node(nodeId).numberOfShardsWithState(ShardRoutingState.RELOCATING);
int totalRemainingShards = currentlyRelocatingShards + currentShardsOnNode;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing dealing with INITIALIZING shards, which may be assigned but are part of the total? Should those be considered as shards currently on the node that need to be migrated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good question. Since the allocation decider should prevent any new shards from being allocated to the node, the only INITIALIZING shards should be ones which were already assigned to the node before the shutdown was registered, but there still might be some.

In what circumstances would shards stay INITIALIZING for a long time? I'm not sure if snapshot restoration shows up as that, or another status - do you know off the top of your head?

INITIALIZING shards can't be relocated per my understanding, so we'd have to wait until they move to STARTED... and then would they relocate? Or would we have to trigger a reroute again? I think I'm going to have to ask the distributed team about some of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what circumstances would shards stay INITIALIZING for a long time? I'm not sure if snapshot restoration shows up as that, or another status - do you know off the top of your head?

Initial recovery during startup as well as opening a closed index moves shards to initializing I think, and I think if a user set the check_on_startup setting it can take quite a while. I don't know about snapshot restoration unfortunately.

INITIALIZING shards can't be relocated per my understanding, so we'd have to wait until they move to STARTED... and then would they relocate? Or would we have to trigger a reroute again?

I believe they cannot be relocated during INITIALIZING state (it would make sense you can't relocate a shard recovering from local disk), and I think they will be subject to a relocation/reroute once they've reached the STARTED state, but don't hold me to that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Henning told me elsewhere that (among other things):

If we have cases of shards getting "stuck" in INITIALIZING, I would call it a bug, but we have seen cases of slowness (which is possibly also a bug).

So we shouldn't get cases of shards being stuck in INITIALIZING indefinitely, but there may be some where they are very slow to move out of that state. So this now counts initializing shards towards the shards remaining count, and if there are only initializing shards left, reports IN_PROGRESS with a reason explaining that there are only INITIALIZING shards left. That should help surface unexpected conditions more quickly. I'm not sure we can get much more detail there easily, though.

@gwbrown gwbrown marked this pull request as ready for review June 15, 2021 23:15
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Jun 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@gwbrown gwbrown requested a review from dakrone June 15, 2021 23:15
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left two really minor comments, but otherwise looks good!

snapshotsInfoService.snapshotShardSizes(),
System.nanoTime()
);
allocation.debugDecision(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (at least, I can't see it anywhere) that we don't use the actual message for the decision anywhere. Since we don't, I think we can remove this line and avoid the debugging decision (which allows allocation deciders to short-circuit and be a bit more efficient).

Either that, or maybe we can log the the actual decision (which will include a plethora of details) at TRACE level if an unmoveable shard is found? That would be nice for debugging I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure why, but calling allocationService.explainShardAllocation() with a RoutingAllocation that doesn't have debugDecision set to true trips this assertion. I think that might just be because the only other place that calls this method is the Allocation Explain API, so we might be able to change it but I didn't want to do so without consulting the distributed team. Which could be a follow-up PR?

In the mean time, I'll change this to at least log the decision at TRACE.

Copy link
Member

@dakrone dakrone Jun 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you can use:

allocation.setDebugMode(DebugMode.EXCLUDE_YES_DECISIONS);

In place of the allocation.debugDecision(true) to only return the "NO"s, which will cut down on the amount of logging also (while still being in debug mode)

.filter(pair -> pair.v2().getMoveDecision().getAllocationDecision().equals(AllocationDecision.THROTTLED) == false)
// These shards will move as soon as possible
.filter(pair -> pair.v2().getMoveDecision().getAllocationDecision().equals(AllocationDecision.YES) == false)
.findFirst();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want to use the allocation decision (from my comment above, because you don't use the v2() anywhere but for filtering), this could just be:

.map(Tuple::v1)
.findFirst();

And then it only needs to be an Optional<ShardRouting> above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, thanks!

@gwbrown gwbrown merged commit 335266c into elastic:master Jun 17, 2021
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Jun 17, 2021
)

This commit modifies the Node Shutdown Status API to include information about the number of shards left to migrate off of the node in question, as well as checking if shard migration is stalled.

"Stalled" is defined as no shards currently relocating off the node, and at least one shard on the node which cannot move per current allocation deciders.

Relates elastic#70338
gwbrown added a commit that referenced this pull request Jun 17, 2021
This commit modifies the Node Shutdown Status API to include information about the number of shards left to migrate off of the node in question, as well as checking if shard migration is stalled.

"Stalled" is defined as no shards currently relocating off the node, and at least one shard on the node which cannot move per current allocation deciders.

Relates #70338
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown >non-issue Team:Core/Infra Meta label for core/infra team v7.14.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants