Skip to content

Conversation

@ktoso
Copy link
Member

@ktoso ktoso commented Aug 23, 2019

Motivation:

ProcessIsolated is likely going to be our main answer for isolation until we get better answers, as such, it has to be really polished.

In this PR I aim to explain how to use it as well as provide tests for the escalating failures to the guardians, which in turn kill the process, which causes the servant supervision to kick in and thus show off real life supervision scenarios with process isolation.

This can be adopted immediately in real systems to provide isolation, even with Swift not having any cool isolation features right now.

Modifications:

Result:

Production ready ProcessIsolated.

@ktoso
Copy link
Member Author

ktoso commented Aug 27, 2019

Very weird dependency failure... resolves and runs locally 🤔

@ktoso
Copy link
Member Author

ktoso commented Aug 27, 2019

Ah ofc, missed a rebase

@ktoso
Copy link
Member Author

ktoso commented Aug 27, 2019

Heh ok, failure was that SwiftPrometheus changed API... gotta love non-reproducible builds ¯_(ツ)_/¯ Comitting Package.resolved is a good thing iMHO :P

@ktoso
Copy link
Member Author

ktoso commented Aug 27, 2019

Some nasty issues with supervision re the escalate change I'll get to the bottom but likely tomorrow.

@ktoso
Copy link
Member Author

ktoso commented Aug 28, 2019

This includes a cherry picked #51 which should be merged first.

@ktoso
Copy link
Member Author

ktoso commented Aug 28, 2019

Failure was #13 -- time to focus on it and fix then 👍 I'll do so tomorrow, unless someone wants to pick up :)

@ktoso ktoso changed the title [WORK IN PROGRESS] Towards prod-ready ProcessIsolated use-cases Towards prod-ready basic ProcessIsolated Aug 29, 2019
@ktoso ktoso changed the title Towards prod-ready basic ProcessIsolated Polished yet basic basic ProcessIsolated Aug 29, 2019
@ktoso
Copy link
Member Author

ktoso commented Aug 29, 2019

Would welcome a review, best viewed in its entirety after reading the docs commit which is: f68f85b

This also introduces .escalate which was needed to tie together supervision trees with this entire idea of process isolation

@ktoso ktoso requested a review from drexin August 29, 2019 15:27
self.notifyParentWeDied()
self.notifyWatchersWeDied()
self.notifyParentOfTermination()
self.notifyWatchersOfTermination()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "we" was reading a bit weird

@ktoso
Copy link
Member Author

ktoso commented Aug 29, 2019

All tests are integration tests, they are runnable by:

$ ./IntegrationTests/run-tests.sh (-f for filtering) (-v for verbose)
...
Running test suite 'tests_02_process_isolated'
Skipping test 'test_01_kill_master_must_not_leave_orphans.sh'
Running test 'test_02_kill_servant_master_restarts_it.sh'... OK (11s)
Running test 'test_03_servant_spawning_not_leak_fds.sh'... OK (11s)
Running test 'test_04_failing_servants_to_cause_servant_respawn.sh'... OK (10s)
Running test 'test_05_failing_servant_to_cause_backoff_respawn.sh'... OK (11s)
OK (ran 4 tests successfully)

// finally, once prepared, you have to invoke the following:
// which will BLOCK on the master process and use the main thread to
// process any incoming process commands (e.g. spawn another servant)
isolated.blockAndSuperviseServants()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hoping to get rid of this part eventually, but for now good enough 🤔

Copy link
Member

@yim-lee yim-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just feedback on the documentation changes

Copy link
Member

@drexin drexin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things. Overall looks very good.

if let cluster = self._cluster {
self._clusterEventStream = try! EventStream(self, name: "clusterEvents")
let clusterEvents = try! EventStream<ClusterEvent>(self, name: "clusterEvents")
self._clusterEvents = clusterEvents // TODO: why stored on self here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is required to create the ClusterControl which is created in a computed property in ActorSystem+Cluster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess this can be done either way, we create control here or the stream 🤔

Let's leave it be for now though feels like we should create control here 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#74

// end::imports[]

private struct BatSignal {
func becomeBatman() -> Batman {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is a good idea, because copyright etc. Maybe use a different example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, okey

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed :) abb9a44

@ktoso ktoso changed the title Polished yet basic basic ProcessIsolated Polished yet basic ProcessIsolated Aug 30, 2019
ktoso added 11 commits August 30, 2019 12:22
Remove all Fault handling code
== Motivation

.escalate makes failures be escalated to parent upon failure,
these are escalated more if the parent also does .escalate
automatically.

If one wants to isolate the failure, i.e. "stop it from bubbling up",
keeping the supervision as .stop does the trick -- the actor which is
.stop supervising stops, rather than escalates the issue.

Watching is inherently "same-ish" to this -- watching is a superset of
supervision escalate. A parent always is notified about child
termination. Only watching a child though makes the death pact active.

Alternatively, one may want to supervise the spawned child with
.escalate, meaning that any failures the child does we shoudl get...
but we will NOT get its .stop signals (!) which watching WOULD give us.
@ktoso
Copy link
Member Author

ktoso commented Aug 30, 2019

Comments addressed in 5abe303 abb9a44

Any further review on the impl or should we merge?

@drexin
Copy link
Member

drexin commented Aug 30, 2019

I think we can merge.

@drexin drexin merged commit f697132 into apple:master Aug 30, 2019
ktoso pushed a commit that referenced this pull request Aug 31, 2019
* Make adapted actor refs watchable #480

* PR also resolves #40
ktoso added a commit that referenced this pull request Aug 31, 2019
* implement top level watch and ensure we can escalate

* =sup #31 Implements .escalate supervision scheme

== Motivation

.escalate makes failures be escalated to parent upon failure,
these are escalated more if the parent also does .escalate
automatically.

If one wants to isolate the failure, i.e. "stop it from bubbling up",
keeping the supervision as .stop does the trick -- the actor which is
.stop supervising stops, rather than escalates the issue.

Watching is inherently "same-ish" to this -- watching is a superset of
supervision escalate. A parent always is notified about child
termination. Only watching a child though makes the death pact active.

Alternatively, one may want to supervise the spawned child with
.escalate, meaning that any failures the child does we shoudl get...
but we will NOT get its .stop signals (!) which watching WOULD give us.

* Allow process.exit() when failure reaches guardian

* only escalation should cause escalated to be populated in signal

* -act,fault #50 remove fault handling code and test

Remove all Fault handling code

* -act,fault #50 remove fault handling code and test

* Hardening process isolated, node watcher uses event to detect down

* Implemented that master process restarts only as many times as configured

* Ensuring that backoff restarts work in process supervision

Minor bash fixups

rebase fixups

* Documentation for ProcessIsolated

* can't yet validate docs on linux

* compilation fix, no getpid on examples

* one off error in test

* Address comments and reword deathwatch docs

* remove batman pun
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants