Skip to content

Conversation

@ktoso
Copy link
Member

@ktoso ktoso commented Jun 4, 2020

This was a bug in the handshakes; they would wrongly continue handshaking even if we already had a handshake completed -- mistake was one missing return .same

Motivation:

  • event handshakes caused twice

Modifications:

Result:

How associations and retries are stored:

The gossip changes:

Have not been failing a long time:

Test fixup:

@ktoso
Copy link
Member Author

ktoso commented Jun 4, 2020

There will still be failures; I'm taking them on one by one.

eventsOnFirstSub.shouldContain(.membershipChange(.init(node: second.cluster.node, fromStatus: nil, toStatus: .joining)))
eventsOnFirstSub.shouldContain(.membershipChange(.init(node: first.cluster.node, fromStatus: .joining, toStatus: .up)))
eventsOnFirstSub.shouldContain(.membershipChange(.init(node: second.cluster.node, fromStatus: .joining, toStatus: .up)))
eventsOnFirstSub.shouldContain(.leadershipChange(Cluster.LeadershipChange(oldLeader: nil, newLeader: .init(node: first.cluster.node, status: .joining))!)) // !-safe, since new/old leader known to be different
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not 100% specific/certain that's the events we'll get; it depends on timing when we subscribe; there could already be the .joining event in there which means the unknown -> joining will not be emitted as event since it was in the snapshot etc.

The test already automatically checks that no event is signalled twice though, so we won't get incorrect things.

state.log.debug("Association already allocated for remote: \(reflecting: remoteNode), existing association: [\(existingAssociation)]")
switch existingAssociation.state {
case .associating:
()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong (after the recent rework);

Proper retrying must be handled more properly (today we hammer on retries, but should do backoffs and giving up too) -- will be done separately (has tickets)

ktoso added 17 commits June 15, 2020 11:46
test_sendingMessageToNotYetAssociatedNode_mustCauseAssociationAttempt OK
that some actors kick off another outgoing handshake still
…pdatesAutomatically

The listing may happen a moment after; we can see a listing that is
empty at first after all
@ktoso
Copy link
Member Author

ktoso commented Jun 16, 2020

Commits have the individual issues resolved one by one, going to merge and keep hardening remaining things.

@ktoso ktoso merged commit 5de9887 into apple:master Jun 16, 2020
@ktoso ktoso deleted the fix-test_up_ensureAllSubscribersGetMovingUpEvents branch June 16, 2020 00:44
@ktoso
Copy link
Member Author

ktoso commented Jul 3, 2020

The double events was:

Resolves #606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment