Skip to content

Conversation

@gwbrown
Copy link
Contributor

@gwbrown gwbrown commented Jul 27, 2021

This PR fixes two situations where NOT_STARTED can appear as the shard migration status inappropriately:

  1. When the node is actually shut down after having all the shards migrate away.
  2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

Follow-up to #75606

@gwbrown gwbrown added >non-issue v8.0.0 :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown v7.15.0 labels Jul 27, 2021
@gwbrown gwbrown changed the title Fix NOT_STARTED statuses appearing when they shouldn't during node shutdown Fix NOT_STARTED statuses appearing inappropirately during node shutdown Jul 27, 2021
@gwbrown gwbrown marked this pull request as ready for review August 12, 2021 22:49
@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Aug 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@gwbrown gwbrown requested a review from dakrone August 12, 2021 22:49
@gwbrown gwbrown removed the request for review from dakrone August 12, 2021 22:52
@gwbrown gwbrown requested a review from dakrone August 12, 2021 23:01
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Gordon, I left a few comments but nothing major

import java.util.function.Function;
import java.util.stream.Collectors;

public class ShutdownService implements ClusterStateListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add javadocs to this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.

I wonder if it would be better named something like NodeSeenService?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed! Good call, I didn't really think about the naming at the time.

.values()
.stream()
.map(singleNodeShutdownMetadata -> {
if (nodesNotPreviouslySeen.contains(singleNodeShutdownMetadata.getNodeId())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should recalculate the nodesNotPreviouslySeen since there might be other changes since the time this cluster state update task actually runs

return singleNodeShutdownMetadata;
})
.collect(Collectors.toUnmodifiableMap(SingleNodeShutdownMetadata::getNodeId, Function.identity()));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth it to add if (newShutdownMetadataMap.equals(shutdownMetadata.getAllNodeMetadataMap()) { return currentState; } to this so we avoid work if not necessary?

@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 13, 2021

Build failures are both timeouts in tests that appear unrelated and don't reproduce locally.

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/part-2

@gwbrown gwbrown requested a review from dakrone August 17, 2021 21:34
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one pretty minor comment

import java.util.function.Function;
import java.util.stream.Collectors;

public class ShutdownService implements ClusterStateListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look at it, this class name is a little strange since this service doesn't actually do anything related to shutting down the nodes.

I wonder if it would be better named something like NodeSeenService?

@gwbrown gwbrown added the auto-backport Automatically create backport pull requests when merged label Aug 17, 2021
@gwbrown
Copy link
Contributor Author

gwbrown commented Aug 18, 2021

Build failures are due to a compilation error on 7.x earlier today, which has now been fixed.

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/part-2

@gwbrown gwbrown merged commit bc5295b into elastic:master Aug 18, 2021
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Aug 18, 2021
…down (elastic#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.x

elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2021
…down (#75750) (#76634)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Aug 18, 2021
…down (elastic#75750) (elastic#76634)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (elastic#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
elasticsearchmachine pushed a commit that referenced this pull request Aug 18, 2021
…down (#75750) (#76634) (#76669)

* Fix `NOT_STARTED` statuses appearing inappropirately during node shutdown (#75750)

This PR fixes two situations where `NOT_STARTED` can appear as the shard migration status inappropriately:
1. When the node is actually shut down after having all the shards migrate away.
2. When a non-data-node is registered for shutdown.

It also adds tests to ensure these cases are handled correctly.

* Fix compilation for backport

* Fix compilation for backport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Core/Infra/Node Lifecycle Node startup, bootstrapping, and shutdown >non-issue Team:Core/Infra Meta label for core/infra team v7.15.0 v8.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants