-
Notifications
You must be signed in to change notification settings - Fork 79
Closed
Labels
Milestone
Description
Found during "real cluster" and some tests after the Membership became more stable thanks to the seen tables in #376
There's a number of cases here I think which were not covered:
- if we we try .connect, and the connection fails we never replied back to SWIM so it had no chance to mark .unreachable
- it never stored such member in its members to ping either, so even if the node is completely killer, swim would never try to keep pinging it and never issue unreachable
- since it might not issue unreachable, downing never has a change to trigger and nodes never get removed
- if we dont notice a node is unreachable, but other nodes tell us in gossip it is
- our gossip instance applies the change to its membership but does not notify the cluster about the -> unreachable it seems 🤔
- this again leads to not issuing down
Uncovering this depends on a bunch of stuff from the hardening so will fix this as separate specific commits, but likely as part of the larger #376 PR.