Polished yet basic ProcessIsolated #40

ktoso · 2019-08-23T13:40:56Z

Motivation:

ProcessIsolated is likely going to be our main answer for isolation until we get better answers, as such, it has to be really polished.

In this PR I aim to explain how to use it as well as provide tests for the escalating failures to the guardians, which in turn kill the process, which causes the servant supervision to kick in and thus show off real life supervision scenarios with process isolation.

This can be adopted immediately in real systems to provide isolation, even with Swift not having any cool isolation features right now.

Modifications:

expose and polish supervision mode .escalate
document the isolation schemes
allow top level guardians to exit() or shutdown() when a fault reaches them; Resolves Crash process if failure escalates to a Guardian (esp. /system) #31
TODO remove "leaking but surviving mode" ??? -act,fault #50 remove fault handling code and test #51
TODO cleanup

Result:

Production ready ProcessIsolated.

ktoso · 2019-08-27T12:33:54Z

Very weird dependency failure... resolves and runs locally 🤔

ktoso · 2019-08-27T12:34:41Z

Ah ofc, missed a rebase

ktoso · 2019-08-27T12:39:30Z

Heh ok, failure was that SwiftPrometheus changed API... gotta love non-reproducible builds ¯_(ツ)_/¯ Comitting Package.resolved is a good thing iMHO :P

ktoso · 2019-08-27T12:52:14Z

Some nasty issues with supervision re the escalate change I'll get to the bottom but likely tomorrow.

ktoso · 2019-08-28T13:08:31Z

This includes a cherry picked #51 which should be merged first.

ktoso · 2019-08-28T13:43:56Z

Failure was #13 -- time to focus on it and fix then 👍 I'll do so tomorrow, unless someone wants to pick up :)

ktoso · 2019-08-29T15:27:53Z

Would welcome a review, best viewed in its entirety after reading the docs commit which is: f68f85b

This also introduces .escalate which was needed to tie together supervision trees with this entire idea of process isolation

ktoso · 2019-08-29T15:28:44Z

Sources/DistributedActors/ActorShell.swift

-        self.notifyParentWeDied()
-        self.notifyWatchersWeDied()
+        self.notifyParentOfTermination()
+        self.notifyWatchersOfTermination()


The "we" was reading a bit weird

ktoso · 2019-08-29T15:31:21Z

All tests are integration tests, they are runnable by:

$ ./IntegrationTests/run-tests.sh (-f for filtering) (-v for verbose)

...
Running test suite 'tests_02_process_isolated'
Skipping test 'test_01_kill_master_must_not_leave_orphans.sh'
Running test 'test_02_kill_servant_master_restarts_it.sh'... OK (11s)
Running test 'test_03_servant_spawning_not_leak_fds.sh'... OK (11s)
Running test 'test_04_failing_servants_to_cause_servant_respawn.sh'... OK (10s)
Running test 'test_05_failing_servant_to_cause_backoff_respawn.sh'... OK (11s)
OK (ran 4 tests successfully)

ktoso · 2019-08-29T15:32:23Z

IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_escalatingWorkers/main.swift

+// finally, once prepared, you have to invoke the following:
+// which will BLOCK on the master process and use the main thread to
+// process any incoming process commands (e.g. spawn another servant)
+isolated.blockAndSuperviseServants()


hoping to get rid of this part eventually, but for now good enough 🤔

scripts/docs/generate_api.sh

yim-lee

Just feedback on the documentation changes

Docs/failure_handling.adoc

drexin

Just a few things. Overall looks very good.

Docs/failure_handling.adoc

Sources/DistributedActors/ActorShell.swift

drexin · 2019-08-29T17:54:16Z

Sources/DistributedActors/ActorSystem.swift

            if let cluster = self._cluster {
-                self._clusterEventStream = try! EventStream(self, name: "clusterEvents")
+                let clusterEvents = try! EventStream<ClusterEvent>(self, name: "clusterEvents")
+                self._clusterEvents = clusterEvents // TODO: why stored on self here?


Because it is required to create the ClusterControl which is created in a computed property in ActorSystem+Cluster

Hm, I guess this can be done either way, we create control here or the stream 🤔

Let's leave it be for now though feels like we should create control here 🤔

Sources/DistributedActors/Signals.swift

Sources/DistributedActors/Supervision.swift

drexin · 2019-08-29T18:34:17Z

Tests/DistributedActorsDocumentationTests/ProcessIsolatedDocExamples.swift

+// end::imports[]
+
+private struct BatSignal {
+    func becomeBatman() -> Batman {


Not sure this is a good idea, because copyright etc. Maybe use a different example.

Fixed :) abb9a44

Tests/DistributedActorsTests/SupervisionTests.swift

Remove all Fault handling code

== Motivation .escalate makes failures be escalated to parent upon failure, these are escalated more if the parent also does .escalate automatically. If one wants to isolate the failure, i.e. "stop it from bubbling up", keeping the supervision as .stop does the trick -- the actor which is .stop supervising stops, rather than escalates the issue. Watching is inherently "same-ish" to this -- watching is a superset of supervision escalate. A parent always is notified about child termination. Only watching a child though makes the death pact active. Alternatively, one may want to supervise the spawned child with .escalate, meaning that any failures the child does we shoudl get... but we will NOT get its .stop signals (!) which watching WOULD give us.

…ured

Minor bash fixups rebase fixups

ktoso · 2019-08-30T03:39:55Z

Comments addressed in 5abe303 abb9a44

Any further review on the impl or should we merge?

drexin · 2019-08-30T04:47:52Z

I think we can merge.

* Make adapted actor refs watchable #480 * PR also resolves #40

* implement top level watch and ensure we can escalate * =sup #31 Implements .escalate supervision scheme == Motivation .escalate makes failures be escalated to parent upon failure, these are escalated more if the parent also does .escalate automatically. If one wants to isolate the failure, i.e. "stop it from bubbling up", keeping the supervision as .stop does the trick -- the actor which is .stop supervising stops, rather than escalates the issue. Watching is inherently "same-ish" to this -- watching is a superset of supervision escalate. A parent always is notified about child termination. Only watching a child though makes the death pact active. Alternatively, one may want to supervise the spawned child with .escalate, meaning that any failures the child does we shoudl get... but we will NOT get its .stop signals (!) which watching WOULD give us. * Allow process.exit() when failure reaches guardian * only escalation should cause escalated to be populated in signal * -act,fault #50 remove fault handling code and test Remove all Fault handling code * -act,fault #50 remove fault handling code and test * Hardening process isolated, node watcher uses event to detect down * Implemented that master process restarts only as many times as configured * Ensuring that backoff restarts work in process supervision Minor bash fixups rebase fixups * Documentation for ProcessIsolated * can't yet validate docs on linux * compilation fix, no getpid on examples * one off error in test * Address comments and reword deathwatch docs * remove batman pun

yim-lee mentioned this pull request Aug 27, 2019

Don't insert space around either side of range operator #48

Closed

ktoso changed the title ~~[WORK IN PROGRESS] Towards prod-ready ProcessIsolated use-cases~~ Towards prod-ready basic ProcessIsolated Aug 29, 2019

ktoso changed the title ~~Towards prod-ready basic ProcessIsolated~~ Polished yet basic basic ProcessIsolated Aug 29, 2019

ktoso requested a review from drexin August 29, 2019 15:27

ktoso commented Aug 29, 2019

View reviewed changes

scripts/docs/generate_api.sh Show resolved Hide resolved

ktoso mentioned this pull request Aug 29, 2019

Reference documentation: Isolation should be a main topic #36

Closed

yim-lee reviewed Aug 29, 2019

View reviewed changes

drexin requested changes Aug 29, 2019

View reviewed changes

ktoso changed the title ~~Polished yet basic basic ProcessIsolated~~ Polished yet basic ProcessIsolated Aug 30, 2019

ktoso added 11 commits August 30, 2019 12:22

-act,fault #50 remove fault handling code and test

63605fd

Remove all Fault handling code

implement top level watch and ensure we can escalate

1ccc383

Allow process.exit() when failure reaches guardian

b1f0e7e

only escalation should cause escalated to be populated in signal

4411c31

-act,fault #50 remove fault handling code and test

bba25d2

Hardening process isolated, node watcher uses event to detect down

5e60a41

Implemented that master process restarts only as many times as config…

d82a583

…ured

Ensuring that backoff restarts work in process supervision

179db91

Minor bash fixups rebase fixups

Documentation for ProcessIsolated

cd2fc67

can't yet validate docs on linux

07fa71b

ktoso added 4 commits August 30, 2019 12:23

compilation fix, no getpid on examples

446ccb7

one off error in test

28c4694

Address comments and reword deathwatch docs

5abe303

remove batman pun

abb9a44

drexin approved these changes Aug 30, 2019

View reviewed changes

drexin merged commit f697132 into apple:master Aug 30, 2019

ktoso pushed a commit that referenced this pull request Aug 31, 2019

Make adapted actor refs watchable #480

eadc9c7

* Make adapted actor refs watchable #480 * PR also resolves #40

Polished yet basic ProcessIsolated #40

Polished yet basic ProcessIsolated #40

Uh oh!

Conversation

ktoso commented Aug 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation:

Modifications:

Result:

Uh oh!

ktoso commented Aug 27, 2019

Uh oh!

ktoso commented Aug 27, 2019

Uh oh!

ktoso commented Aug 27, 2019

Uh oh!

ktoso commented Aug 27, 2019

Uh oh!

ktoso commented Aug 28, 2019

Uh oh!

ktoso commented Aug 28, 2019

Uh oh!

ktoso commented Aug 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ktoso commented Aug 29, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yim-lee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drexin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ktoso commented Aug 30, 2019

Uh oh!

drexin commented Aug 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ktoso commented Aug 23, 2019 •

edited

Loading