SWIM: when pinged node is immediately dead, we sometimes don't issue .unreachable

Found during "real cluster" and some tests after the Membership became more stable thanks to the seen tables in https://github.com/apple/swift-distributed-actors/pull/376

There's a number of cases here I think which were not covered:

- if we we try .connect, and the connection fails we never replied back to SWIM so it had no chance to mark .unreachable
  - it never stored such member in its members to ping either, so even if the node is completely killer, swim would never try to keep pinging it and never issue unreachable
  - since it might not issue unreachable, downing never has a change to trigger and nodes never get removed
- if we dont notice a node is unreachable, but other nodes tell us in gossip it is
  - our gossip instance applies the change to its membership but does not notify the cluster about the -> unreachable it seems 🤔  
  - this again leads to not issuing down

Uncovering this depends on a bunch of stuff from the hardening so will fix this as separate specific commits, but likely as part of the larger #376 PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SWIM: when pinged node is immediately dead, we sometimes don't issue .unreachable #397

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SWIM: when pinged node is immediately dead, we sometimes don't issue .unreachable #397

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions