-
Notifications
You must be signed in to change notification settings - Fork 79
=cluster fix double-handshakes and double-cluster events #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
=cluster fix double-handshakes and double-cluster events #654
Conversation
|
There will still be failures; I'm taking them on one by one. |
| eventsOnFirstSub.shouldContain(.membershipChange(.init(node: second.cluster.node, fromStatus: nil, toStatus: .joining))) | ||
| eventsOnFirstSub.shouldContain(.membershipChange(.init(node: first.cluster.node, fromStatus: .joining, toStatus: .up))) | ||
| eventsOnFirstSub.shouldContain(.membershipChange(.init(node: second.cluster.node, fromStatus: .joining, toStatus: .up))) | ||
| eventsOnFirstSub.shouldContain(.leadershipChange(Cluster.LeadershipChange(oldLeader: nil, newLeader: .init(node: first.cluster.node, status: .joining))!)) // !-safe, since new/old leader known to be different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not 100% specific/certain that's the events we'll get; it depends on timing when we subscribe; there could already be the .joining event in there which means the unknown -> joining will not be emitted as event since it was in the snapshot etc.
The test already automatically checks that no event is signalled twice though, so we won't get incorrect things.
| state.log.debug("Association already allocated for remote: \(reflecting: remoteNode), existing association: [\(existingAssociation)]") | ||
| switch existingAssociation.state { | ||
| case .associating: | ||
| () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was wrong (after the recent rework);
Proper retrying must be handled more properly (today we hammer on retries, but should do backoffs and giving up too) -- will be done separately (has tickets)
…up_ensureAllSubscribersGetMovingUpEvents
test_sendingMessageToNotYetAssociatedNode_mustCauseAssociationAttempt OK
that some actors kick off another outgoing handshake still
…pdatesAutomatically The listing may happen a moment after; we can see a listing that is empty at first after all
Co-authored-by: Yim Lee <[email protected]>
|
Commits have the individual issues resolved one by one, going to merge and keep hardening remaining things. |
|
The double events was: Resolves #606 |
This was a bug in the handshakes; they would wrongly continue handshaking even if we already had a handshake completed -- mistake was one missing
return .sameMotivation:
Modifications:
Result:
How associations and retries are stored:
The gossip changes:
Have not been failing a long time:
Test fixup: