-
Notifications
You must be signed in to change notification settings - Fork 79
+swim #401 complete unreachable state transitions and reachability events logic #403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ## | ||
| ## This source file is part of the Swift Distributed Actors open source project | ||
| ## | ||
| ## Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: 2020
|
Whoop, missed a warning; Means tho that the build finally does check for them :) 📗 fixed... |
yim-lee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I can't review the intricate parts of the logic too well but there seems to be good test cases to cover the bases. 👍
IntegrationTests/tests_04_cluster/it_Clustered_swim_suspension_reachability/main.swift
Outdated
Show resolved
Hide resolved
Indeed, it's quite tricky and one has to know the paper and our cluster internals well.. Thanks for having a look @yim-lee! |
|
Hmm integration test not happy on linux it seems |
|
So doing |
|
Failure on the known |
|
@swift-server-bot test this please |
|
Debugging the integration test getting stuck on CI with no logs printed from some point... |
|
The problem was using SIGTSTP and not SIGSTOP to suspend the process which made the CI integration test not pass. There's another task to complete the more "unit/integration" test |
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up ticket #405
Implement missing SWIM functionality to move across the
unreachable -> aliveedge and signal those to the ClusterShell.Motivation:
Without this the unreachable state was really terminal, and a node would never become reachable again.
Realistically, we'll indeed want to signal unreachable, and immediately signal down in the downing strategy, however that is not how the state is designed -- it shall allow moving back to alive. As only the dead state is "terminal".
More to be done here soon, but this unlocks the "big hardening PR" which was failing because of some SWIM problems that the previous ( this one was important #400 ) and this PR address.
Modifications:
markingas "change" thanks to which we can emit an event to the cluster shell only if the change really was "effective".I am struggling with making this test reproduced and clean in the SWIMShellClusteredTests 🆘 ❗️still a bit sucky, but managed to have it at least not timeout due to Swift deciding to recompile all projects whenswift runis hit in the integration test... 0c1df2d// FIXME: Can't seem to implement a hardened test like this... func ignored_test_swim_shouldNotifyClusterAboutUnreachableNode_andThenReachableAgain() throws {and need to revisit somehow OR decide that we do not test that specific dance without integration test? (would be a lest down)Result: