-
Notifications
You must be signed in to change notification settings - Fork 25.6k
MasterNodeChangePredicate should use the node instance to detect master change #25877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MasterNodeChangePredicate should use the node instance to detect master change #25877
Conversation
…er change This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart.
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullPointerException
| public static Predicate<ClusterState> build(ClusterState currentState) { | ||
| final long currentVersion = currentState.version(); | ||
| final String currentMaster = currentState.nodes().getMasterNodeId(); | ||
| final String currentMasterId = currentState.nodes().getMasterNode().getEphemeralId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullPointerException if getMasterNode() returns null?
| final String newMasterId = newState.nodes().getMasterNode().getEphemeralId(); | ||
| final boolean accept; | ||
| if (newMaster == null) { | ||
| if (newMasterId == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newMasterId (= getEphemeralId()) cannot be null, getMasterNode() can be, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad refactoring. yikes. will fix and check for the testing coverage.
| if (newMasterId == null) { | ||
| accept = false; | ||
| } else if (newMaster.equals(currentMaster) == false){ | ||
| } else if (newMasterId.equals(currentMasterId) == false){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's nicer to just capture the DiscoveryNode object here and use equals on that (internally it will use the ephemeral id for comparison anyhow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's I had it before. I refactored it as I prefer to have explicit references to places that depends on the ephemeral id. I know we use the disco nodes themselves in many places but I don't want to add another one.
|
Thx @ywelsch . I addressed your feedback. Can you take another look please? |
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (after addressing 3 nits)
| if (newMaster == null) { | ||
| accept = false; | ||
| } else if (newMaster.equals(currentMaster) == false){ | ||
| } else if (newMaster.getEphemeralId().equals(currentMasterId) == false){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space missing between closing and opening bracket: false){
|
|
||
| public void testDelegateToFailingMaster() throws ExecutionException, InterruptedException { | ||
| boolean failsWithConnectTransportException = randomBoolean(); | ||
| boolean failsWithConnectTransportException = true || randomBoolean(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to remove true || before pushing ;-)
| // reset the same state to increment a version simulating a join of an existing node | ||
| setState(clusterService, clusterService.state()); | ||
| if (randomBoolean()) { | ||
| // simulate master node removal removal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removal removal removal
…er change (#25877) This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart. Fixes #25471
…er change (#25877) This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart. Fixes #25471
…er change (#25877) This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart. Fixes #25471
…er change (#25877) This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart. Fixes #25471
This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart.
Fixes #25471