MasterNodeChangePredicate should use the node instance to detect master change #25877

bleskes · 2017-07-25T10:09:57Z

This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart.

Fixes #25471

…er change This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart.

ywelsch

NullPointerException

ywelsch · 2017-07-25T13:29:23Z

core/src/main/java/org/elasticsearch/cluster/MasterNodeChangePredicate.java

    public static Predicate<ClusterState> build(ClusterState currentState) {
        final long currentVersion = currentState.version();
-        final String currentMaster = currentState.nodes().getMasterNodeId();
+        final String currentMasterId = currentState.nodes().getMasterNode().getEphemeralId();


NullPointerException if getMasterNode() returns null?

ywelsch · 2017-07-25T13:29:53Z

core/src/main/java/org/elasticsearch/cluster/MasterNodeChangePredicate.java

+            final String newMasterId = newState.nodes().getMasterNode().getEphemeralId();
            final boolean accept;
-            if (newMaster == null) {
+            if (newMasterId == null) {


newMasterId (= getEphemeralId()) cannot be null, getMasterNode() can be, however.

bad refactoring. yikes. will fix and check for the testing coverage.

ywelsch · 2017-07-25T13:31:42Z

core/src/main/java/org/elasticsearch/cluster/MasterNodeChangePredicate.java

+            if (newMasterId == null) {
                accept = false;
-            } else if (newMaster.equals(currentMaster) == false){
+            } else if (newMasterId.equals(currentMasterId) == false){


maybe it's nicer to just capture the DiscoveryNode object here and use equals on that (internally it will use the ephemeral id for comparison anyhow).

That's I had it before. I refactored it as I prefer to have explicit references to places that depends on the ephemeral id. I know we use the disco nodes themselves in many places but I don't want to add another one.

…ter_restarts

bleskes · 2017-07-26T10:18:33Z

Thx @ywelsch . I addressed your feedback. Can you take another look please?

ywelsch

LGTM (after addressing 3 nits)

ywelsch · 2017-07-26T11:44:43Z

core/src/main/java/org/elasticsearch/cluster/MasterNodeChangePredicate.java

            if (newMaster == null) {
                accept = false;
-            } else if (newMaster.equals(currentMaster) == false){
+            } else if (newMaster.getEphemeralId().equals(currentMasterId) == false){


space missing between closing and opening bracket: false){

ywelsch · 2017-07-26T11:45:36Z

core/src/test/java/org/elasticsearch/action/support/master/TransportMasterNodeActionTests.java


    public void testDelegateToFailingMaster() throws ExecutionException, InterruptedException {
-        boolean failsWithConnectTransportException = randomBoolean();
+        boolean failsWithConnectTransportException = true || randomBoolean();


You probably want to remove true || before pushing ;-)

ywelsch · 2017-07-26T11:47:11Z

core/src/test/java/org/elasticsearch/action/support/master/TransportMasterNodeActionTests.java

-            // reset the same state to increment a version simulating a join of an existing node
-            setState(clusterService, clusterService.state());
+            if (randomBoolean()) {
+                // simulate master node removal removal


removal removal removal

…er change (#25877) This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart. Fixes #25471

bleskes added :Distributed Coordination/Discovery-Plugins Anything related to our integration plugins with EC2, GCP and Azure >bug v5.6.0 v6.0.0 labels Jul 25, 2017

bleskes requested a review from ywelsch July 25, 2017 10:09

ywelsch suggested changes Jul 25, 2017

View reviewed changes

bleskes added 2 commits July 26, 2017 11:28

Merge remote-tracking branch 'upstream/master' into master_change_mas…

8c9f665

…ter_restarts

feedback

81f83dc

ywelsch approved these changes Jul 26, 2017

View reviewed changes

bleskes added 2 commits July 26, 2017 13:52

Merge branch 'master' into master_change_master_restarts

41dea32

feddback

fed572e

bleskes merged commit 03eb146 into elastic:master Jul 26, 2017

bleskes deleted the master_change_master_restarts branch July 26, 2017 15:03

colings86 added v6.0.0-beta1 and removed v6.0.0 labels Jul 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MasterNodeChangePredicate should use the node instance to detect master change #25877

MasterNodeChangePredicate should use the node instance to detect master change #25877

Uh oh!

bleskes commented Jul 25, 2017

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Jul 25, 2017

Uh oh!

ywelsch Jul 25, 2017

Uh oh!

bleskes Jul 25, 2017

Uh oh!

ywelsch Jul 25, 2017

Uh oh!

bleskes Jul 26, 2017

Uh oh!

bleskes commented Jul 26, 2017

Uh oh!

ywelsch left a comment

Uh oh!

ywelsch Jul 26, 2017

Uh oh!

ywelsch Jul 26, 2017

Uh oh!

ywelsch Jul 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MasterNodeChangePredicate should use the node instance to detect master change #25877

MasterNodeChangePredicate should use the node instance to detect master change #25877

Uh oh!

Conversation

bleskes commented Jul 25, 2017

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bleskes commented Jul 26, 2017

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants