Skip to content

Conversation

@henningandersen
Copy link
Contributor

@henningandersen henningandersen commented Jun 6, 2019

This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).

Part of #42518

This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically). It
goes in a direction of most methods returning the state as if the
restarting node did not exist, avoiding various exceptions stemming from
accessing the stopped node(s).
@henningandersen henningandersen added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v8.0.0 v7.3.0 labels Jun 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

When explicitly asking for an instance from a node, return it also from
closed nodes.
@henningandersen henningandersen requested a review from ywelsch June 11, 2019 08:00
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a suggestion on how we could possibly do this in a simpler way


private synchronized NodeAndClient getRandomNodeAndClient(Predicate<NodeAndClient> predicate) {
private NodeAndClient getRandomNodeAndClient(Predicate<NodeAndClient> predicate) {
return getRandomNodeAndClientIncludingClosed(((Predicate<NodeAndClient>) nc -> nc.isClosed() == false).and(predicate));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid the cast by writing this as predicate.and(nc -> nc.isClosed() == false)?

}
}

public boolean isClosed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS you're using isClosed() == false everywhere. perhaps turn this into an isOpen method, and then just use NodeAndClient::isOpen as a predicate.

expectedNodes.add(getInstanceFromNode(ClusterService.class, nodeAndClient.node()).localNode());
}
Set<DiscoveryNode> expectedNodes =
nodes.values().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we temporarily remove the node from the nodes map when it is restarted during the close? This could possibly simplify things here, not requiring us to exclude closed nodes everywhere. WDYT?

Copy link
Contributor Author

@henningandersen henningandersen Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did not do so due to test failures. In hindsight, I think I agree, it will leave this in a much cleaner state. Will give it a go and see if I can fix the tests that needs fixing (at least GatewayIndexStateIT.testArchiveBrokenClusterSettings since it calls getInstance for a stopped node).

Changed strategy to simply remove the nodes from the nodes map during
restart. This simplifies this and cements that you cannot get a hold of
a restarting node in onStopped. Fixed the 3 violating cases to comply
with that.
@henningandersen
Copy link
Contributor Author

@elasticmachine update branch

@henningandersen
Copy link
Contributor Author

Thanks for reviewing @ywelsch , this is now much simpler and ready for another round.

@henningandersen henningandersen requested a review from ywelsch June 14, 2019 12:53
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
}

if (activeDisruptionScheme != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this? Is it a NOOP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now call publishNode, which both adds to the nodes map and applies active disruption scheme.

@henningandersen henningandersen merged commit 4fd7a22 into elastic:master Jun 17, 2019
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jun 17, 2019
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).
henningandersen added a commit that referenced this pull request Jun 17, 2019
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >test Issues or PRs that are addressing/adding tests v7.3.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants