diff --git a/Docs/SWIM.adoc b/Docs/SWIM.adoc index a4474e034..462c6decf 100644 --- a/Docs/SWIM.adoc +++ b/Docs/SWIM.adoc @@ -1,4 +1,4 @@ - +[[SWIM]] == SWIM Membership Protocol Swift Distributed Actors implements a variant of the https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf[SWIM Membership Protocol] diff --git a/Docs/actors.adoc b/Docs/actors.adoc index faaf14445..b7bb52425 100644 --- a/Docs/actors.adoc +++ b/Docs/actors.adoc @@ -448,6 +448,8 @@ section of the guide), or its `dispatcher`. [[suggested_props_pattern]] ==== Suggested Props Pattern +#TODO: deprecate this and replace with "shell" pattern I guess?# + Sometimes when implementing behaviors which may be spawned by other users, it may be useful to centralize the props creation along with its default "suggested" settings. The _Suggested Props_ pattern explains a common style in which this can be solved. diff --git a/Docs/examples.adoc b/Docs/examples.adoc index 21223e47f..62061054d 100644 --- a/Docs/examples.adoc +++ b/Docs/examples.adoc @@ -5,7 +5,7 @@ This section contains links and explanations of simple and more advanced examples. -INFO: Contributions are very welcome, please reach out if you'd like to show off an example app you have built. +TIP: Contributions are very welcome, please reach out if you'd like to show off an example app you have built. === Dining Philosophers diff --git a/Docs/failure_handling.adoc b/Docs/failure_handling.adoc index 579e6910e..2c6d59025 100644 --- a/Docs/failure_handling.adoc +++ b/Docs/failure_handling.adoc @@ -2,24 +2,31 @@ == Failure Handling > Failures in distributed systems are common place, and one has to design such systems to be resilient to failures. -> Thankfully actors make this task much simpler, as all kinds of failures are unified into supervision or termination events, -> to which one can react to. - -CAUTION: **The current supervision implementation** (of "faults" specifically) **is a Proof of Concept.** + - + - It "works, but..." will leak memory that the actor has allocated in case of - supervision handling a fault (e.g. a division by zero). The application process will remain alive, the fault - will be logged and handled by stopping or restarting the offending actor, however resources will leak. + - This behavior can be controlled with the `supervisionMode` setting on api:ActorSystemSettings[struct]. + - + - Supervision also handles `Error` throws which is safe in the same way throws in normal Swift applications are. +> Thankfully actors make this task much simpler, as all kinds of failures are unified into termination events, to which one can react to. + +=== Let it crash! + +The simplest way to explain failure handling with Actors is the _"Let it Crash!"_ motto. It means that rather than trying to +continue running in a (potentially) faulty state, the actor that ended up in some "wrong" state is meant to crash, +and the system shall react to this by providing compensating actions. This model is simple at its core, but can be +surprisingly powerful and liberating when applied to concurrent or distributed systems. + +NOTE: Failure handling in actors leans itself more towards handling "panics" (drives, connections or databases failing) + rather than errors which may be used for flow control, such as validation errors and similar. + + + + Continue using familiar Swift mechanism such as `try/throws`, `Error` and `Result` to handle errors which can be accounted for, + and rely on actor supervision and isolation for "unrecoverable" or "unpredicted" failures. + +In this chapter we will learn about <> and the various <> offered by this library. + +- By using <> it is possible to automate some of the otherwise mundane tasks of restarting with fresh state or backing off a few seconds before attempting to resume processing of messages. +- And by utilizing some of the built-in stronger isolation capabilities it is possible to isolate and deal with even faults such as fatalErrors, or node failures in a clustered environment. [[supervision_vocabulary]] === Failure Vocabulary -Before we explain the mechanisms available to you to handle failures in Swift Distributed Actors, we first need to define a shared vocabulary -that we will use while discussing these. Since there are various "types" of failure conditions, we want to be as specific -as possible when discussing them. +Before we explain the mechanisms available to handle failures, we first need to define a shared vocabulary that we will use while discussing these. +Since there are various "types" of failure conditions, we want to be as specific as possible when discussing them. * *Error* as in "throw an error" - the classic meaning of "error" in Swift, which is bound to the `Error` type. ** Examples: any `Error` that is `throw`n from inside of a Swift function. @@ -38,7 +45,8 @@ Further more, you can expect see phases involving the word "crash". We use this handling or a failure causing the stopping of an entire node. The wording should usually classify what the subject of the crash was, e.g. _"the system crashed"_ or _"the request handler (actor) crashed."_ -=== Supervision +[[actor_supervision]] +=== Actor Supervision > Supervision allows actors to isolate failures and optionally restart by clearing their state and continuing to process their mailbox while retaining their identity. @@ -103,7 +111,6 @@ include::{dir_sact_doc_tests}/SupervisionDocExamples.swift[tag=supervise_full_us The above snippet would yield the following log output (with logLevel set to `INFO`): -// we have to sed inline mode as the overflowing lines otherwise don't aling with the line numbers :pygments-linenums-mode: inline [source] ---- @@ -134,94 +141,81 @@ philosophy. We focus on crashing and recovering, rather than continuing in "mayb #TODO: it did not contain which message caused the failure, which should be added automatically by Swift Distributed Actors in case it is not obvious# -==== Supervision: Simple Fault Supervision Example +[[failure_isolation]] +=== Failure Isolation Modes -Having analysed the error handling example, we can now slightly modify the example to showcase _fault handling_ as well. -We do so by changing our greeter to one which knows which fruit a person likes, and for any incoming name it prints -this persons favourite fruit. +Actor systems allow you to isolate failure in various levels, of both "effort" and amount of achieved isolation / safety. -The actor performs the fruit name lookup in an unsafe way however, by force unwrapping the result of the subscript. -In reality this would be most likely a bug, and perhaps a harder to spot one like displayed here, but the example -is good enough for being able to showcase the fault handling capabilities of supervision: +NOTE: Failure isolation across nodes is simple, as such processes by design do not share memory and only communicate +using messages over the network, the failure of one component is already isolated thanks to the network boundary. + + + + We do however have the ability to detect and react to node failures, which we'll discuss in the <> section of the guide. -.Modified behavior, now unsafely force unwrapping fruit names: -[source] ----- -include::{dir_sact_doc_tests}/SupervisionDocExamples.swift[tag=supervise_fault_example] ----- +==== Thread Isolation (Default) -.Spawning and interacting with the actor remains unchanged: -[source] ----- -include::{dir_sact_doc_tests}/SupervisionDocExamples.swift[tag=supervise_fault_usage] ----- +[cols="15,10,75"] +|=== -Before analyzing the log output of this application snippet, let us focus on the fact that while the actor now performs -unsafe actions which result in a fault (which usually would bring down an entire Swift process), the use-site did not -really change that much. +h| When to use 2.+| Always; Always-on and complementary to other modes. -The expectation is, that regardless of the actor performing possibly faulty operations (such as force unwrapping, -index out of bounds, or similar fatal errors), the actor shall be restarted and remain usable -- without further -intervention of the caller. In other words, the faults of the fruit actor, are isolated within itself, and do not -affect neither the entire process not the sending side of the messages. +.2+<.^h| Isolation s| Failure | Errors are handled by the offending actor's supervision strategy, or its parent if escalated. Faults are not isolated. -The log output obtained from running the above fault handling application snippet would look similar to the following: + s| Memory | By convention actors _SHOULD NOT_ share any mutable state with eachother as it may result in concurrent access violations. -WARNING: **WARNING:** The following shows the output of the "hacky signal handlers" supervision in action. We would like - to obtain proper unwind support in Swift such that we could implement more correctly (i.e. no leaks, better - backtrace perhaps, esp on linux). + - + - Having that said, let's look at how the current implementation handles faults. +h| Effort 2.+| None +|=== -:pygments-linenums-mode: inline -[source] ----- -[INFO][main.swift:31][/user/favFruit#2041854707] Alice likes [Apples]! -Fatal error: Unexpectedly found nil while unwrapping an Optional value -[WARN][Mailbox.swift:179][/user/favFruit#2041854707] Supervision: Actor has FAULTED [fault(Actor faulted while processing message '[Boom!]:String', with backtrace)]:Failure while interpreting MailboxRunPhase.processingUserMessages, handling with RestartingSupervisor(initialBehavior: receive((Function)), strategy: RestartDecisionLogic(maxRestarts: 5, within: Optional(DistributedActors.TimeAmount(nanoseconds: 1000000000)), restartsWithinCurrentPeriod: 0, restartsPeriodDeadline: 0), canHandle: AllFailures); Failure details: fault(Actor faulted while processing message '[Boom!]:String': -0 Swift Distributed ActorsBenchmarks 0x000000010eb0a9a4 sact_get_backtrace + 52 -1 Swift Distributed ActorsBenchmarks 0x000000010eb0aec8 sact_sighandler + 88 -2 libsystem_platform.dylib 0x00007fff5a7d5b3d _sigtramp + 29 -3 ??? 0x0000000000000000 0x0 + 0 -4 libswiftCore.dylib 0x000000010f105e63 $Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF + 19 -5 Swift Distributed ActorsBenchmarks 0x000000010ece8188 $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_ + 328 -6 Swift Distributed ActorsBenchmarks 0x000000010ece870d $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_TA + 13 -... -[WARN][ActorCell.swift:323][/user/favFruit#2041854707] Restarting. -[INFO][main.swift:31][/user/favFruit#2041854707] Bob likes [Bananas]! -Fatal error: Unexpectedly found nil while unwrapping an Optional value -[WARN][Mailbox.swift:179][/user/favFruit#2041854707] Supervision: Actor has FAULTED [fault(Actor faulted while processing message '[Boom Boom!]:String', with backtrace)]:Failure while interpreting MailboxRunPhase.processingUserMessages, handling with RestartingSupervisor(initialBehavior: receive((Function)), strategy: RestartDecisionLogic(maxRestarts: 5, within: Optional(DistributedActors.TimeAmount(nanoseconds: 1000000000)), restartsWithinCurrentPeriod: 1, restartsPeriodDeadline: 312506020116583), canHandle: AllFailures); Failure details: fault(Actor faulted while processing message '[Boom Boom!]:String': -0 Swift Distributed ActorsBenchmarks 0x000000010eb0a9a4 sact_get_backtrace + 52 -1 Swift Distributed ActorsBenchmarks 0x000000010eb0aec8 sact_sighandler + 88 -2 libsystem_platform.dylib 0x00007fff5a7d5b3d _sigtramp + 29 -3 libsystem_malloc.dylib 0x00007fff5a79b4a4 tiny_malloc_from_free_list + 445 -4 libswiftCore.dylib 0x000000010f105e63 $Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF + 19 -5 Swift Distributed ActorsBenchmarks 0x000000010ece8188 $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_ + 328 -6 Swift Distributed ActorsBenchmarks 0x000000010ece870d $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_TA + 13 -... -[WARN][ActorCell.swift:323][/user/favFruit#2041854707] Restarting. -[INFO][main.swift:31][/user/favFruit#2041854707] Caplin likes [Cucumbers]! ----- +By design an actor is always ensured to "own" a thread during its execution, however no further isolation (of heap, nor otherwise) is currently provided. + +By convention, actors MUST NOT share mutable state between one another, as this breaks the underlying core assumption that an actor "owns" all of its state, +and communicates with others only via (immutable) messages. It is highly recommended to achieve defacto isolation of state + +NOTE: Swift currently does not allow for stronger forms of isolation of faults. + + But if it did, the actor model's failure handling extends nicely over to isolating faults "in a specific actor." + +==== (Managed) Process Isolation + +[cols="15,10,75"] +|=== + +h| When to use 2.+| To isolate groups of actors from faults, including Swift fatal errors, or even unsafe (objective-)C code interop. + +.2+<.^h| Isolation s| Failure | Errors and Faults (!), are isolated at the process boundary. Errors may be escalated to Guardian actors and cause a system re-spawn when needed. + + s| Memory | Same as parent/child-process isolation. Memory leaks or corruption within child do not affect parent or other nodes directly (though may cause resource exhaustion of host). + +h| Effort 2.+| Low/Medium; The bootstrap of your application MUST be handled within <>. See #TODO NIO example# how to use with Swift NIO. + +h| Security note 2.+| *This is NOT a security feature.* No protection from actively malicious code is achieved by this approach. + It can protect from faulty code and memory leaks leaks and issues in unsafe, but not from malicious code. +|=== + +Managed process isolation, using <>, allows the the actor system to automatically manage processes on your behalf, +and enables deploying actors onto specific sub-processes forming failure domains, as well as allowing for automatic supervision +of those processes. See <> for an in depth guide for setting up a process isolated actor system. + +==== Node Isolation + +[cols="15,10,75"] +|=== + +h| When to use 2.+| To be resilient against node failure, or scale out; Isolate nodes by means of network between them. -So not only did the supervision indeed keep the process alive, dump a backtrace of the fault, but also did the supervisor -properly decide that the actor shall be restarted. From a functional perspective this did not differ much from handling -an error using supervision. Section <> dives deeper into analysing backtraces. +.2+<.^h| Isolation s| Failure | Complete; A separate node (e.g. on different host) is completely isolated from any other node, and can not cause issue to other nodes, other than being detected as down. -At the same time however, **faults should not be taken lightly as they most likely indicate a bug -in the program**. Supervision enables easier location, debugging and potentially replicating scenarios (messages) -which cause the fault condition, such that it is easier to fix the application and deploy a fixed version as soon as possible. + s| Memory | Complete (by means of network) -TIP: Supervision is not a replacement for fixing faults. Faults should be taken seriously, as they usually indicate a - programming error. + - + - Supervision does however help locate the fault by helping to narrow it down: - - to specific actor (e.g. representing a specific user for which this fault occurs), - - to message (or signal) which caused the fault, - - by offering a backtrace for each occurring fault. +h| Effort 2.+| Low/Medium; The bootstrap of your application MUST be handled within <>. See #TODO NIO example# how to use with Swift NIO. +|=== -#TODO cleanup logs a bit, the supervisor is quite large in the printout# +By scaling out an actor system to multiple hosts, one is able to build resilient fault tolerant systems, that even in case of an entire node terminating can continue to function. +The down side is the operational cost of having to worry about multiple hosts, however this cost can often be mitigated by automating host or node restarts with external cluster orchestrators such as kubernetes or similar. -=== Supervision strategies +Failure detection in clustered mode is handled by a distributed failure detector (see <>) and has an inherent timing aspect to it. +However, even though nodes may be distributed, all the same <> mechanisms as worked locally work the exact same way within the clustered system. + +[[supervision_strategies]] +=== Supervision Strategies As shown in the previous section, actors can be supervised by applying specific props settings to them at spawn-time. In a later section <> is introduced, which allows for more freedom with regards to observing actors for failure, @@ -229,7 +223,8 @@ however does not allow for identity preserving restarts of failing actors. Swift Distributed Actors provides a number of built in supervision strategies to choose from (see api:SupervisionStrategy[enum]). -==== Supervision Strategy: .stop +[[supervision_stop]] +==== Supervision Strategy: `.stop` > Logs failure and crashes the given actor immediately. @@ -238,22 +233,19 @@ stops the failing actor unconditionally in case of any failure. For now let's see how we could configure an actor to restart a few times, if it encountered a failure: -==== Supervision Strategy: .restart(atMost:within:) +[[supervision_restart]] +==== Supervision Strategy: `.restart(atMost:within:)` > Aggressively restarts actor, up until `atMost` restarts `within` a time period. -WARNING: PoC Limitation: the current _fault_ handling mechanisms rely on POSIX http://man7.org/linux/man-pages/man7/signal.7.html[signal handlers], - and as such may not mix well with other libraries which rely on this functionality.+ - + - Our goal is to move away from this implementation detail eventually, however we can not commit at this point yet - as to when other `unwind` mechanisms would arrive in Swift itself. These would in turn allow implementing these - in a more compatible fashion. +#TODO: better docs or point to API docs?# -==== Supervision Strategy: .restart(atMost:within:backoff) +[[supervision_restart_backoff]] +==== Supervision Strategy: `.restart(atMost:within:backoff)` > Restarts the actor, applying backoff pauses between the restart attempts -#TODO: Not implemented yet and may affect API or merge into .restart entirely, where we pass in a backoff strategy etc.# +#TODO: better docs or point to API docs?# One of the more advanced strategies is restarting with backoff. The term backoff here means that after a failure, the actor will remain "alive" however it will not be restarted until a certain amount of time has passed -- the backoff time. @@ -265,9 +257,122 @@ potentially even be harmful to the system. An example of where supervision _without_ backoff is even harmful includes situations where e.g. database goes down, and all actors crash and repeatedly try to re-establish the connection). +[[supervision_escalate]] +==== Supervision Strategy: `.escalate` + +#TODO: better docs or point to API docs?# + + +[[supervision_tree]] +=== Supervision Hierarchies + +#TODO: write this# + +Actors live and die in a strict hierarchy, the so-called _Actor Hierarchy_ or _Supervision Tree_ which groups actors using parent-child relationships. +Parent actors enjoy some special privileges with regards to being able to select and react to their children's failures, +including (when using `.escalate` supervision) being able to inspect the cause of a child actor's failure. + +Other actors - be it siblings or other completely unrelated actors - may only _watch_ a given actor for termination. + +This leads to the formation of so-called supervision hierarchies or supervision "trees". +It also allows us to structure our applications in ways that represent their _fault domains_. + +image::actor_tree.png[] + +#TODO: explain bulk heading and compartmentalization.# + +==== Escalating Failures + +The `.escalate` supervision strategy deserves a bit more discussion as it fulfils the useful pattern of escalating ("bubbling up") +failures up a supervision tree until an actor prevents it from escalating. + +WARNING: If such escalation is not stopped at any level and reaches a guardian actor (e.g. `/user`, or `/system`), + it will react by performing the action configured in api:ActorSystemSettings[struct]`.failure.onGuardianFailure`, + which may result in shutting down the actor system, or forcefully exiting the process. + + +[[death_watch]] +=== Death Watch and Terminated signals + +While supervision is very powerful, it is also (by design) limited to parent-child relationships. This is in order to simplify +supervision schemes and confine them to _supervision trees_, where parent actors define the supervision scheme of their children, +when they spawn them. In that sense, supervision strategies are part of the actor itself, rather than the parent. However, +the parent is the place where this supervision is selected, as well as any parent being automatically terminations of their children. + +While supervision is enforced in _tree_ hierarchies, the _watch_ ("death watch") feature lifts this restriction, +and any actor may `context.watch()` any other actor it has a reference to. Death watch however does not enable the watchee +to take any part in the watched actors lifecycle, other than being notified when the watched actor has terminated (either +by stopping gracefully, or crashing). + +Once an actor is watched, its termination will result in the watching actor receiving a api:Signals/Terminated[enum] +signal, which may be handled using `Behavior.receiveSignal` or `Behavior.receiveSpecificSignal` behaviors. + +TIP: Any actor can `watch` any other actor, if it has obtained its api:ActorRef[struct]. + +==== Death Pact + +Upon _watching_ another actor, the watcher enters a so-called "death pact" with the watchee. In other words, if the watchee +terminates, the watcher will receive a api:Signals/Terminated[enum] signal, and if that signal is _not handled_ the watcher +will fail itself with a api:DeathPactError[enum]. + +This pattern is tremendously useful to bind lifecycles of multiple actors to one another. For example, if a `player` actor +signifies the presence of the player in a game match, and other actors represent its various actions, tasks, or units it is controlling, +it makes sense, for all the auxiliary actors to only exist while the player remains active. This can be achieved by having each of +those actors api:ActorContext[class]`.watch` the `player` they belong to. Without any further modifications, if the player actor +terminates (for whatever reason), all of its actions, tasks and units would terminate automatically: + +[source] +---- +include::{dir_sact_doc_tests}/DeathWatchDocExamples.swift[tag=simple_death_watch] +---- +<1> Whenever creating a game unit, we pair it with its owning player; the unit immediately _watches_ the `player` upon starting +<2> Once setup is complete, we become the unit's main behavior, along with handling the player _termination signal_ +<3> By _not_ handling any termination signals, whenever the `player` terminates, the `gameUnit` will automatically fail with a death pact error; ensuring we won't leak the unit. + +The above scenario can be performed more gracefully as well. All actors watching the `player` could explicitly handle +the player termination signal and decide on how they want to deal with this individually. + +Perhaps a game match in an multilayer game would want to wait for the player to reconnect for a few seconds, +before they indeed signal termination and concede the game to the opponent. Or perhaps, any ongoing workers related to given player +should run to completion their current work, but not accept any new requests, and eventually terminate as well? + +In the example below, if the `player` terminates, the `GameMatch` gives it a few seconds reconnect to the game, and otherwise we terminate the match, +effectively giving up the match by withdrawal of one of the players. + +[source] +---- +include::{dir_sact_doc_tests}/DeathWatchDocExamples.swift[tag=handling_termination_deathwatch] +---- + +TIP: Use death pacts to bind lifecycles of various actors to one another. + + A death pact error is thrown whenever an api:Signals/Terminated[enum] is received but left `.unhandled`. + +==== Death Watch Guarantees + +For the sake of describing those guarantees let us assume we have two actors, Romeo and Juliet. +Romeo performs a `context.watch(juliet)` during its `setup`. + +It is by Swift Distributed Actors guaranteed that: + +- if Juliet terminated Romeo will receive a `.terminated` signal which it can handle by using an `Behavior.receiveSignal`, +- the `.terminated` message will never be duplicated or lost. As it goes with system messages, one can assume "exactly once" processing for them, +including in distributed settings (which is a stronger guarantee than given for plain user messages), +- + +Furthermore, if we imagine the above two actors be in the actual play of Shakespeare, then there would also be an audience, +which would be watching both of these actors. This means that many actors (the audience) can be watching our "stage" actors. +For this situation the following is guaranteed: + +- if an actor (any actor) is watching any of the terminating actors it WILL receive a `.terminated` message, +- all of the actors watching a terminating actor WILL receive the `.terminated` message, +- even in distributed settings, if a watcher has "not noticed immediately" what was going on on stage (e.g. due to lack of network connectivity), +once it is informed by other actors (handled internally via cluster gossip), it will receive the outstanding `.terminated` as-if it had observed the death with its own eyes. + +#TODO: complete this section# + [[specific_failure_supervision]] -=== Supervising specific Failures +=== Selective Supervision of Specific Failures > It is also possible to selectively apply supervision depending on the type of the failure. @@ -310,137 +415,102 @@ NOTE: Supervision is not intended to replace `do/catch` blocks, and should be us e.g. by carrying trace or similar metadata which could help identify if the error exists only for a specific entity or situation. +#TODO: DOCUMENT MORE?# -[[failure_isolation]] -=== Failure isolation - -#TODO: write this# - -NOTE: Failure isolation across nodes is simple, as such processes by design do not share memory and only communicate - using messages over the network, the failure of one component is already isolated thanks to the network boundary. + - + - We do however have the ability to detect and react to node failures, which we'll discuss in the <> section - of the guide. - -[[supervision_tree]] -=== Supervision hierarchies - -#TODO: write this# +[[process_isolated]] +=== Fault Supervision by Process Isolation -Swift Distributed Actors enforces a strict parent-child relationship between actors. Only an actor's parent actor MAY supervise it. -Other actors - be it siblings or other completely unrelated actors - may only _watch_ a given actor for termination. +#TODO: document in depth# -This leads to the formation of so called supervision hierarchies or supervision "trees". -It also allows us to structure our applications in ways that represent their _fault domains_. +Process isolation can be seen as an extension of actor supervision, however on the process level. -TODO: explain bulk heading and compartmentalization. +By segregating groups of actors into their own processes, this allows building semantically meaningful _failure domains_, +i.e. one might put all actors responsible for a specific batch job or specific work type into their own process, as if any +of those actors _fault_, all of them should be terminated as a whole. -=== A word of caution: fatal faults +==== Using `ProcessIsolated` -#TODO: complete this# +In order to discover semantic relationships of "failing together" one might keep the musketeers motto of "all for one, and one for all" in mind +when considering the following: when given actor faults, which actors should also terminate as they are very likely in "bad state" as well. -Swift Distributed Actors does NOT allow recovering from "fatal faults" such as segmentation faults (segfaults) or similar faults. -Some faults are serious enough that continuing running after they have occurred is too risky and should not be taken lightly. +In a fully pure and isolated actor world the answer usually is that only itself, and potentially any of its watchers and parent, +however in face of unsafe code and faults (fatal errors), we might have to thread more carefully, and terminate an entire group of actors. +The such discovered grouping of actors is a good approximation of a good failure domain -- and you should put all those actors into the same servant process. -At the same time, bear in mind that while it may be possible to recover from some faults using Swift Distributed Actors, it may not be the best course of action, -sometimes faults are legitimate problems and can leave your system vulnerable. Make sure to always investigate fault crashes -of your actors and aim to build systems where faults do not occur on regular basis. +Spawning actors in a specific process is done like this: -=== Failure Isolation +[source] +---- +include::{dir_sact_doc_tests}/ProcessIsolatedDocExamples.swift[tag=spawn_in_domain] +---- +<1> Every process isolated application needs to start off with creating an isolated actor system for its own. + + The returned `isolated` contains the actor system and additional control functions. +<2> In general, any code not enclosed in an `isolated.run` will execute on _any_ spawned process, including the master. Use this to prepare things common for master and servant processes. +<3> Any code enclosed in an `isolated.run(on: ProcessIsolated.Role)` will execute only on given process role. Default roles are `.master` and `.servant`, however you can add additional roles. +<4> Inside the `.master` process, we spawn one servant process. This will execute "the same" application, however with different configuration, such that it will automatically connect to the master. +<5> We decide to _supervise_ the servant _process_ using a _respawn_ strategy with exponential backoff as well as at most 5 faults within its lifetime. Upon process fault, the master will re-spawn another process with the same configuration as the terminated process. +<6> We spawn an actor in the master node. As there is always only one master node, this actor will also _definitely_ only exist once on the master process. +<7> Finally, we run a block _only_ on servant processes, in which we spawn another actor. If we spawned more servant processes, each would get its own "alfredPennyworth" actor on its own process/node. +<8> Last but not least, we park the main thread of the applications in a loop that is used for handling process spawning. #TODO this may be simplified in the future# -Actor systems allow you to isolate failure in various levels, of both "effort" and amount of achieved isolation / safety. +The above example shows how to use api:ProcessIsolated[class] to build a simple 2 process application. If either a failure +were to _escalate_ through alfred to the `/user` guardian or the servant process were to encounter a _fault_ anywhere, +the parent (master) process would notice this immediately, and use its `.respawn` servant process supervision strategy +to respawn the child process, causing a new _alfred_ actor to be spawned. -==== Process Isolation +As for how _bruce_ and _alfred_ can communicate to one another, you should refer to documentation about the <>. -#TODO: DOCUMENT# +It is also worth exploring the various helper functions on the api:ProcessIsolated[class] class, as it offers some convenience utilities +for working with processes in various roles. -[[backtraces]] -=== Analysing Back Traces (Crash Logs) +Note also that file descriptors or any other state is _not_ passed to servant processes, so e.g. all files that the master +process had opened before spawning the child are closed when the servant is spawned. Any communication between the two, +should be implemented the same way as if the processes were completely separate nodes in a cluster -- i.e. by using the receptionist, +or other mechanisms such as gossip. -When a fault occurs, Swift Distributed Actors will print a crash log. An excellent talk from WWDC 2018 titled -https://developer.apple.com/videos/play/wwdc2018/414/["Understanding Crashes and Crash Logs"] is available and explains -how to read and use crash logs (backtraces) to locate and isolate problems in your code. +TIP: Since actors spawned in different processes have to serialize messages when they communicate with each other, + you have to ensure that messages they exchange are serializable using <>. -Another useful resource to understand crash logs is https://developer.apple.com/library/archive/technotes/tn2151/_index.html[Understanding and Analyzing Application Crash Reports]. +==== One for all and, all for one -==== Demangling backtraces +Using the previous code snippet, showing how to use `ProcessIsolated`, let us now discuss the various failure situations this +setup is able to deal with. We will discuss 3 situations, roughly depicted by the following points of failure: -#TODO This deserves a proper section about it. And likely our way isn't the best yet. Needs more work.# +[.center] +image::process_isolated_servants.png[] -Swift applies name mangling to function names, and due to that raw backtraces may not be easy to read: +===== Failure scenario a) actor failure -[source,text,unnumbered] ----- -0 Swift Distributed ActorsBenchmarks 0x000000010eb0a9a4 sact_get_backtrace + 52 -1 Swift Distributed ActorsBenchmarks 0x000000010eb0aec8 sact_sighandler + 88 -2 libsystem_platform.dylib 0x00007fff5a7d5b3d _sigtramp + 29 -3 libsystem_malloc.dylib 0x00007fff5a79b4a4 tiny_malloc_from_free_list + 445 -4 libswiftCore.dylib 0x000000010f105e63 $Ss18_fatalErrorMessage__4file4line5flagss5NeverOs12StaticStringV_A2HSus6UInt32VtF + 19 -5 Swift Distributed ActorsBenchmarks 0x000000010ece8188 $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_ + 328 -6 Swift Distributed ActorsBenchmarks 0x000000010ece870d $S24Swift Distributed ActorsBenchmarks22favouriteFruitBehaviory0A5Actor0F0OySSGSDyS2SGFAfC0G7ContextCySSG_SStcfU_TA + 13 -7 Swift Distributed ActorsBenchmarks 0x000000010ece8553 $S12DistributedActors0B7ContextCySSGSSAA8BehaviorOySSGs5Error_pIegggozo_ADSSAGsAH_pIeggnozo_TR + 51 -8 Swift Distributed ActorsBenchmarks 0x000000010ece876b $S12DistributedActors0B7ContextCySSGSSAA8BehaviorOySSGs5Error_pIegggozo_ADSSAGsAH_pIeggnozo_TRTA + 27 -9 Swift Distributed ActorsBenchmarks 0x000000010ec8bf44 $S12DistributedActors8BehaviorO16interpretMessage7context7message4file4lineACyxGAA0B7ContextCyxG_xs12StaticStringVSutKF + 916 ----- +Actor failure, shall (currently) be discussed in two categories: a _fault_ and an _error_. -#TODO: Can we offer this in some nicer way?# -#TODO: Can we offer this on Linux? (AFAIR it's more tricky)# +TIP: In the future, we hope to unify those two failure handling schemes under the same scheme, the same which currency applies + to error supervision, however right now this is not possible. In this scenario, let us focus on the escalating errors pattern, + and the fault scenario will be analysed in depth in <>. -Thankfully Swift also ships with a demangle function built right into the `swift` command line tool. -Swift Distributed Actors offers #TODO likely we don't want to keep shipping this but offer something nicer# a simple script which allows -pasting a mangled backtrace and get it back in an easier to read demangled format. Usage is fairly simple, and involves -pasting a backtrace like the one above to the `scripts/sact_backtrace_demangle` script. +When the actor throws an error which causes it to crash, its <> is applied. +If this happens to result in an `.escalate` decision, the error is _bubbled up_ through the actor hierarchy, and if it reached +the `/user` guardian actor, it would terminate the process, and thus triggering the master's servant process supervision mechanism. -[source,text,unnumbered] ----- -$ pbpaste | ./scripts/sact_backtrace_demangle -0 Swift Distributed ActorsBenchmarks 0x000000010eb0a9a4 demangled:sact_get_backtrace + 52 -1 Swift Distributed ActorsBenchmarks 0x000000010eb0aec8 demangled:sact_sighandler + 88 -2 libsystem_platform.dylib 0x00007fff5a7d5b3d demangled:_sigtramp + 29 -3 libsystem_malloc.dylib 0x00007fff5a79b4a4 demangled:tiny_malloc_from_free_list + 445 -4 libswiftCore.dylib 0x000000010f105e63 demangled:Swift._fatalErrorMessage(_: Swift.StaticString, _: Swift.StaticString, file: Swift.StaticString, line: Swift.UInt, flags: Swift.UInt32) -> Swift.Never + 19 -5 Swift Distributed ActorsBenchmarks 0x000000010ece8188 demangled:closure #1 (DistributedActors.ActorContext, Swift.String) -> DistributedActors.Behavior in Swift Distributed ActorsBenchmarks.favouriteFruitBehavior([Swift.String : Swift.String]) -> DistributedActors.Behavior + 328 -6 Swift Distributed ActorsBenchmarks 0x000000010ece870d demangled:partial apply forwarder for closure #1 (DistributedActors.ActorContext, Swift.String) -> DistributedActors.Behavior in Swift Distributed ActorsBenchmarks.favouriteFruitBehavior([Swift.String : Swift.String]) -> DistributedActors.Behavior + 13 -7 Swift Distributed ActorsBenchmarks 0x000000010ece8553 demangled:reabstraction thunk helper from @escaping @callee_guaranteed (@guaranteed DistributedActors.ActorContext, @guaranteed Swift.String) -> (@owned DistributedActors.Behavior, @error @owned Swift.Error) to @escaping @callee_guaranteed (@guaranteed DistributedActors.ActorContext, @in_guaranteed Swift.String) -> (@owned DistributedActors.Behavior, @error @owned Swift.Error) + 51 -8 Swift Distributed ActorsBenchmarks 0x000000010ece876b demangled:partial apply forwarder for reabstraction thunk helper from @escaping @callee_guaranteed (@guaranteed DistributedActors.ActorContext, @guaranteed Swift.String) -> (@owned DistributedActors.Behavior, @error @owned Swift.Error) to @escaping @callee_guaranteed (@guaranteed DistributedActors.ActorContext, @in_guaranteed Swift.String) -> (@owned DistributedActors.Behavior, @error @owned Swift.Error) + 27 ----- +[[process_isolated_scenario_b]] +===== Failure scenario b) Fault in .servant process -[[death_watch]] -=== Death Watch and Terminated signals +Whenever a _fault_ (such as `fatalError`, a division by zero, or a more serious issue, such as a segmentation fault or similar) +happens in a servant process, it shall be immediately terminated (with printing a backtrace if configured to do so which it is by default). -While supervision is very powerful, it is also (by design) limited to parent-child relationships. This is to guide system -implementors towards structuring systems as supervision _trees_ where each parent makes an informed decision about how its -children should be supervised. +All actors hosted in this servant process are terminated, and if any other actor _watched_ any of them on another process/node, +they will immediately be notified with an api:Signals/Terminated[enum] signal. You can read more about terminated messages +and watch semantics in <>. -This however also means that this form of supervision is not dynamic -- i.e. it is not possible to supervise an actor -started by a different parent, nor is it possible to supervise remote actors. This is a deliberate design choice based -on years of experience with Akka based systems and a (now deprecated) feature called "remote deployment" which indeed -allowed such remote supervision, however at an huge complexity -- both implementation and understandability -- cost. +===== Failure scenario c) Fault or .escalation in .master process -Swift Distributed Actors instead enforces the importance of "watch" mechanisms, which are strongly inspired by the `watch` mechanism -in Akka and `link` mechanism in Erlang. +Each time a servant process terminates, the master will apply the supervision strategy that was tied with the now terminated process. +Such strategy MAY yield an `.escalate` decision, which means that the given servant's termination, should escalate and also cause the _master_ +to terminate. -TIP: Any actor can `watch` any other actor, if it has obtained its api:ActorRef[struct]. +If the master terminates, it also _causes all of its servants to terminate as well_. This is done unconditionally in order to prevent +lingering "orphan" processes. This termination is guaranteed, even if the master process is terminated forcefully killing it with a `SIGKILL` signal. -==== Death Watch Guarantees - -For the sake of describing those guarantees let us assume we have two actors, Romeo and Juliet. -Romeo performs a `context.watch(juliet)` during its `setup`. - -It is by Swift Distributed Actors guaranteed that: - -- if Juliet terminated Romeo will receive a `.terminated` signal which it can handle by using an `Behavior.receiveSignal`, -- the `.terminated` message will never be duplicated not lost. As it goes with system messages, one can assume "exactly once" processing for them, -including in distributed settings (which is a stronger guarantee than given for plain user messages), -- - -Furthermore, if we imagine the above two actors be in the actual play of Shakespeare, then there would also be an audience, -which would be watching both of these actors. This means that many actors (the audience) can be watching out "stage" actors. -For this situation the following is guaranteed: - -- if an actor (any actor) is watching any of the terminating actors it WILL receive an `.terminated` message, -- all of the actors watching a terminating actor WILL receive the `.terminated` message, - - even in distributed settings, if a watcher has "not noticed immediately" what was going on on stage (e.g. due to lack of network connectivity), - once it is informed by other actors (handled internally via cluster gossip), it will receive the outstanding `.terminated` as-if it had observed the death with its own eyes. -#TODO: complete this section# +NOTE: While this method is effective in isolating faults including serious ones like segfaults or similar in + the servant (child) processes, it comes at a price; Maintaining many processes is expensive and communication between them + is less efficient than communicating between actors located in the same actor system (as serialization needs to be involved). diff --git a/Docs/images/actor_tree.graffle/data.plist b/Docs/images/actor_tree.graffle/data.plist new file mode 100644 index 000000000..0a16def13 Binary files /dev/null and b/Docs/images/actor_tree.graffle/data.plist differ diff --git a/Docs/images/actor_tree.graffle/image1.pdf b/Docs/images/actor_tree.graffle/image1.pdf new file mode 100644 index 000000000..4426da872 Binary files /dev/null and b/Docs/images/actor_tree.graffle/image1.pdf differ diff --git a/Docs/images/actor_tree.png b/Docs/images/actor_tree.png new file mode 100644 index 000000000..540c79052 Binary files /dev/null and b/Docs/images/actor_tree.png differ diff --git a/Docs/images/process_isolated_servants.graffle/data.plist b/Docs/images/process_isolated_servants.graffle/data.plist new file mode 100644 index 000000000..8817ab973 Binary files /dev/null and b/Docs/images/process_isolated_servants.graffle/data.plist differ diff --git a/Docs/images/process_isolated_servants.graffle/image1.pdf b/Docs/images/process_isolated_servants.graffle/image1.pdf new file mode 100644 index 000000000..4426da872 Binary files /dev/null and b/Docs/images/process_isolated_servants.graffle/image1.pdf differ diff --git a/Docs/images/process_isolated_servants.png b/Docs/images/process_isolated_servants.png new file mode 100644 index 000000000..1eca1b702 Binary files /dev/null and b/Docs/images/process_isolated_servants.png differ diff --git a/Docs/serialization.adoc b/Docs/serialization.adoc index 7d7fa6f9b..e7f7ebbef 100644 --- a/Docs/serialization.adoc +++ b/Docs/serialization.adoc @@ -1,4 +1,4 @@ - +[[serialization]] == Serialization > Swift Distributed Actors offers a serialization layer which allows you to decouple where messages are sent and received, diff --git a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_backoffRespawn/main.swift b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_backoffRespawn/main.swift new file mode 100644 index 000000000..e8e4bc052 --- /dev/null +++ b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_backoffRespawn/main.swift @@ -0,0 +1,70 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Distributed Actors open source project +// +// Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +// Licensed under Apache License v2.0 +// +// See LICENSE.txt for license information +// See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +// +// SPDX-License-Identifier: Apache-2.0 +// +//===----------------------------------------------------------------------===// + +#if os(OSX) +import Darwin.C +#else +import Glibc +#endif + +import DistributedActors + +let isolated = ProcessIsolated { boot in + boot.settings.defaultLogLevel = .info + boot.runOn(role: .servant) { + boot.settings.failure.onGuardianFailure = .systemExit(-1) + } + return ActorSystem(settings: boot.settings) +} + +pprint("Started process: \(getpid()) with roles: \(isolated.roles)") + +struct OnPurposeBoom: Error {} + +isolated.run(on: .master) { + isolated.spawnServantProcess(supervision: + .respawn( + atMost: 5, within: nil, + backoff: Backoff.exponential( + initialInterval: .milliseconds(100), + multiplier: 1.5, + randomFactor: 0 + ) + ) + ) +} + +try isolated.run(on: .servant) { + isolated.system.log.info("ISOLATED RUNNING: \(CommandLine.arguments)") + + _ = try isolated.system.spawn("failed", of: String.self, + props: Props().supervision(strategy: .escalate), + .setup { context in + context.log.info("Spawned \(context.path) on servant node it will fail soon...") + context.timers.startSingle(key: "explode", message: "Boom", delay: .seconds(1)) + + return .receiveMessage { message in + context.log.error("Time to crash with: fatalError") + // crashes process since we do not isolate faults + fatalError("FATAL ERROR ON PURPOSE") + } + }) +} + +// finally, once prepared, you have to invoke the following: +// which will BLOCK on the master process and use the main thread to +// process any incoming process commands (e.g. spawn another servant) +isolated.blockAndSuperviseServants() + +// ~~~ unreachable ~~~ diff --git a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_escalatingWorkers/main.swift b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_escalatingWorkers/main.swift new file mode 100644 index 000000000..172bc8d63 --- /dev/null +++ b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_escalatingWorkers/main.swift @@ -0,0 +1,73 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Distributed Actors open source project +// +// Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +// Licensed under Apache License v2.0 +// +// See LICENSE.txt for license information +// See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +// +// SPDX-License-Identifier: Apache-2.0 +// +//===----------------------------------------------------------------------===// + +#if os(OSX) +import Darwin.C +#else +import Glibc +#endif + +import DistributedActors + +let isolated = ProcessIsolated { boot in + boot.settings.defaultLogLevel = .info + boot.runOn(role: .servant) { + boot.settings.failure.onGuardianFailure = .systemExit(-1) + } + return ActorSystem(settings: boot.settings) +} + +pprint("Started process: \(getpid()) with roles: \(isolated.roles)") + +struct OnPurposeBoom: Error {} + +isolated.run(on: .master) { + isolated.spawnServantProcess(supervision: .respawn(atMost: 1, within: nil), args: ["fatalError"]) + isolated.spawnServantProcess(supervision: .respawn(atMost: 1, within: nil), args: ["escalateError"]) +} + +try isolated.run(on: .servant) { + isolated.system.log.info("ISOLATED RUNNING: \(CommandLine.arguments)") + + // TODO: assert command line arguments are the expected ones + + _ = try isolated.system.spawn("failed", of: String.self, + props: Props().supervision(strategy: .escalate), + .setup { context in + context.log.info("Spawned \(context.path) on servant node it will fail soon...") + context.timers.startSingle(key: "explode", message: "Boom", delay: .seconds(1)) + + return .receiveMessage { message in + if CommandLine.arguments.contains("fatalError") { + context.log.error("Time to crash with: fatalError") + // crashes process since we do not isolate faults + fatalError("FATAL ERROR ON PURPOSE") + } else if CommandLine.arguments.contains("escalateError") { + context.log.error("Time to crash with: throwing an error, escalated to top level") + // since we .escalate and are a top-level actor, this will cause the process to die as well + throw OnPurposeBoom() + } else { + context.log.error("MISSING FAILURE MODE ARGUMENT!!! Test is constructed not properly, or arguments were not passed properly. \(CommandLine.arguments)") + fatalError("MISSING FAILURE MODE ARGUMENT!!! Test is constructed not properly, or arguments were not passed properly. \(CommandLine.arguments)") + } + } + }) +} + +// finally, once prepared, you have to invoke the following: +// which will BLOCK on the master process and use the main thread to +// process any incoming process commands (e.g. spawn another servant) +isolated.blockAndSuperviseServants() + +// ~~~ unreachable ~~~ diff --git a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_noLeaking/main.swift b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_noLeaking/main.swift index ccff41124..3f2c4923b 100644 --- a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_noLeaking/main.swift +++ b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_noLeaking/main.swift @@ -31,10 +31,8 @@ let isolated = ProcessIsolated { boot in pprint("Started process: \(getpid()) with roles: \(isolated.roles)") -let workersKey = Receptionist.RegistrationKey(String.self, id: "workers") - // though one can ensure to only run if in a process of a given role: -try isolated.run(on: .master) { +isolated.run(on: .master) { // open some fds, hope to not leak them into children! var fds: [Int] = [] for i in 1 ... 1000 { @@ -45,7 +43,7 @@ try isolated.run(on: .master) { /// spawn a servant - isolated.spawnServantProcess(supervision: .restart(atMost: 100, within: .seconds(1)), args: ["ALPHA"]) + isolated.spawnServantProcess(supervision: .respawn(atMost: 100, within: .seconds(1)), args: ["ALPHA"]) } // finally, once prepared, you have to invoke the following: diff --git a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_respawnsServants/main.swift b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_respawnsServants/main.swift index 5830eedc5..f8afef33c 100644 --- a/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_respawnsServants/main.swift +++ b/IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_respawnsServants/main.swift @@ -49,7 +49,7 @@ try isolated.run(on: .master) { }) // should we allow anyone to issue this, or only on master? we could `runOnMaster { control` etc - isolated.spawnServantProcess(supervision: .restart(atMost: 100, within: .seconds(1)), args: ["ALPHA"]) + isolated.spawnServantProcess(supervision: .respawn(atMost: 100, within: .seconds(1)), args: ["ALPHA"]) } // Notice that master has no workers, just the pool... diff --git a/IntegrationTests/tests_02_process_isolated/shared.sh b/IntegrationTests/tests_02_process_isolated/shared.sh index 503f049a3..01b9ff672 100644 --- a/IntegrationTests/tests_02_process_isolated/shared.sh +++ b/IntegrationTests/tests_02_process_isolated/shared.sh @@ -13,6 +13,13 @@ ## ##===----------------------------------------------------------------------===## +RED='\033[0;31m' +RST='\033[0m' + +function echoerr() { + echo "${RED}$@${RST}" 1>&2; +} + function _killall() { set +e local killall_app_name="$1" diff --git a/IntegrationTests/tests_02_process_isolated/test_01_kill_master_no_orphans.sh b/IntegrationTests/tests_02_process_isolated/test_01_kill_master_must_not_leave_orphans.sh similarity index 100% rename from IntegrationTests/tests_02_process_isolated/test_01_kill_master_no_orphans.sh rename to IntegrationTests/tests_02_process_isolated/test_01_kill_master_must_not_leave_orphans.sh diff --git a/IntegrationTests/tests_02_process_isolated/test_02_kill_servant_master_restarts_it.sh b/IntegrationTests/tests_02_process_isolated/test_02_kill_servant_master_restarts_it.sh index ae77190e9..473677fb9 100755 --- a/IntegrationTests/tests_02_process_isolated/test_02_kill_servant_master_restarts_it.sh +++ b/IntegrationTests/tests_02_process_isolated/test_02_kill_servant_master_restarts_it.sh @@ -58,7 +58,8 @@ await_n_processes "$app_name" 2 if [[ $(ps aux | awk '{print $2}' | grep ${pid_servant} | grep -v 'grep' | wc -l) -ne 0 ]]; then echo "ERROR: Seems the servant was not killed!!!" - exit -2 + _killall ${app_name} + exit -1 fi await_n_processes "$app_name" 2 diff --git a/IntegrationTests/tests_02_process_isolated/test_03_not_leak_fds.sh b/IntegrationTests/tests_02_process_isolated/test_03_servant_spawning_not_leak_fds.sh similarity index 96% rename from IntegrationTests/tests_02_process_isolated/test_03_not_leak_fds.sh rename to IntegrationTests/tests_02_process_isolated/test_03_servant_spawning_not_leak_fds.sh index 976f96f1e..0e09613e0 100755 --- a/IntegrationTests/tests_02_process_isolated/test_03_not_leak_fds.sh +++ b/IntegrationTests/tests_02_process_isolated/test_03_servant_spawning_not_leak_fds.sh @@ -16,9 +16,6 @@ set -e #set -x # verbose -declare -r RED='\033[0;31m' -declare -r RST='\033[0m' - declare -r my_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" declare -r root_path="$my_path/.." @@ -52,7 +49,9 @@ for pid_servant in $pid_servants; do if [[ $(lsof -p $pid_servant | wc -l) -gt 100 ]]; then lsof -p $pid_servant printf "${RED}ERROR: Seems the servant [${pid_servant}] has too many FDs open, did the masters FD leak?${RST}\n" - exit -2 + + _killall ${app_name} + exit -1 fi done diff --git a/IntegrationTests/tests_02_process_isolated/test_04_failing_servants_to_cause_servant_respawn.sh b/IntegrationTests/tests_02_process_isolated/test_04_failing_servants_to_cause_servant_respawn.sh new file mode 100755 index 000000000..117054ddf --- /dev/null +++ b/IntegrationTests/tests_02_process_isolated/test_04_failing_servants_to_cause_servant_respawn.sh @@ -0,0 +1,84 @@ +#!/bin/bash +##===----------------------------------------------------------------------===## +## +## This source file is part of the Swift Distributed Actors open source project +## +## Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +## Licensed under Apache License v2.0 +## +## See LICENSE.txt for license information +## See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +## +## SPDX-License-Identifier: Apache-2.0 +## +##===----------------------------------------------------------------------===## + +set -e +#set -x # verbose + +declare -r my_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +declare -r root_path="$my_path/.." + +declare -r app_name='it_ProcessIsolated_escalatingWorkers' + +cd ${root_path} + +source ${my_path}/shared.sh + +_killall ${app_name} + +# ====------------------------------------------------------------------------------------------------------------------ +# MARK: the app has workers which fail so hard that the failures reach the top level actors which then terminate the system +# when the system terminates we kill the process; once the process terminates, the servant supervision kicks in and +# restarts the entire process; layered supervision for they win! + +swift build # synchronously ensure built + +declare -r log_file="/tmp/${app_name}.log" +rm -f ${log_file} +swift run ${app_name} > ${log_file} & + +declare -r supervision_respawn_grep_txt='supervision: RESPAWN' +declare -r supervision_stop_grep_txt='supervision: STOP' + +# we want to wait until 2 RESPAWNs are found in the logs; then we can check if the other conditions are as we expect +echo "Waiting for servant to RESPAWN a few times..." +spin=1 # spin counter +max_spins=20 +while [[ $(cat ${log_file} | grep "${supervision_stop_grep_txt}" | wc -l) -ne 2 ]]; do + sleep 1 + spin=$((spin+1)) + if [[ ${spin} -eq ${max_spins} ]]; then + echoerr "Never saw enough '${supervision_stop_grep_txt}' in logs." + cat ${log_file} + exit -1 + fi +done + +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' +cat ${log_file} | grep "${supervision_respawn_grep_txt}" +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' + +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' +cat ${log_file} | grep "${supervision_stop_grep_txt}" +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' + +if [[ $(cat ${log_file} | grep "${supervision_respawn_grep_txt}" | wc -l) -ne 2 ]]; then + echoerr "ERROR: We expected 2 servants to only respawn once, yet other number of respawns was detected!" + cat ${log_file} + + _killall ${app_name} + exit -1 +fi + +if [[ $(cat ${log_file} | grep "${supervision_stop_grep_txt}" | wc -l) -ne 2 ]]; then + echoerr "ERROR: Expected the servants to STOP after they are replaced once!" + cat ${log_file} + + _killall ${app_name} + exit -2 +fi + +# === cleanup ---------------------------------------------------------------------------------------------------------- + +_killall ${app_name} diff --git a/IntegrationTests/tests_02_process_isolated/test_05_failing_servant_to_cause_backoff_respawn.sh b/IntegrationTests/tests_02_process_isolated/test_05_failing_servant_to_cause_backoff_respawn.sh new file mode 100755 index 000000000..f136a93a3 --- /dev/null +++ b/IntegrationTests/tests_02_process_isolated/test_05_failing_servant_to_cause_backoff_respawn.sh @@ -0,0 +1,82 @@ +#!/bin/bash +##===----------------------------------------------------------------------===## +## +## This source file is part of the Swift Distributed Actors open source project +## +## Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +## Licensed under Apache License v2.0 +## +## See LICENSE.txt for license information +## See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +## +## SPDX-License-Identifier: Apache-2.0 +## +##===----------------------------------------------------------------------===## + +set -e +#set -x # verbose + +declare -r my_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +declare -r root_path="$my_path/.." + +declare -r app_name='it_ProcessIsolated_backoffRespawn' + +cd ${root_path} + +source ${my_path}/shared.sh + +_killall ${app_name} + +# ====------------------------------------------------------------------------------------------------------------------ +# MARK: the app has workers which fail so hard that the failures reach the top level actors which then terminate the system +# when the system terminates we kill the process; once the process terminates, the servant supervision kicks in and +# restarts the entire process; layered supervision for they win! + +swift build # synchronously ensure built + +declare -r log_file="/tmp/${app_name}.log" +rm -f ${log_file} +swift run ${app_name} > ${log_file} & + +declare -r supervision_respawn_grep_txt='supervision: RESPAWN BACKOFF' + +# we want to wait until 2 RESPAWNs are found in the logs; then we can check if the other conditions are as we expect +echo "Waiting for servants to RESPAWN BACKOFFs..." +spin=1 # spin counter +max_spins=20 +while [[ $(cat ${log_file} | grep "${supervision_respawn_grep_txt}" | wc -l) -le 2 ]]; do + sleep 1 + spin=$((spin+1)) + if [[ ${spin} -eq ${max_spins} ]]; then + echoerr "Never saw enough '${supervision_respawn_grep_txt}' in logs." + cat ${log_file} + exit -1 + fi +done + +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' +cat ${log_file} | grep "${supervision_respawn_grep_txt}" +echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' + +if [[ $(cat ${log_file} | grep "${supervision_respawn_grep_txt}" | wc -l) -lt 3 ]]; then + echoerr "ERROR: We expected servant to respawn many times..." + cat ${log_file} + + _killall ${app_name} + exit -1 +fi + +if [[ $(cat ${log_file} | grep "restartsWithinCurrentPeriod: 1" | wc -l) -ne 1 ]]; then + echoerr "Expected the backoff supervision to have logged: restartsWithinCurrentPeriod: 1" +fi +if [[ $(cat ${log_file} | grep "restartsWithinCurrentPeriod: 2" | wc -l) -ne 1 ]]; then + echoerr "Expected the backoff supervision to have logged: restartsWithinCurrentPeriod: 2" +fi +if [[ $(cat ${log_file} | grep "restartsWithinCurrentPeriod: 3" | wc -l) -ne 1 ]]; then + echoerr "Expected the backoff supervision to have logged: restartsWithinCurrentPeriod: 3" +fi + + +# === cleanup ---------------------------------------------------------------------------------------------------------- + +_killall ${app_name} diff --git a/Package.swift b/Package.swift index d8332aab1..04881ec3e 100644 --- a/Package.swift +++ b/Package.swift @@ -64,6 +64,13 @@ let targets: [PackageDescription.Target] = [ // ==== ------------------------------------------------------------------------------------------------------------ // MARK: Integration Tests - `it_` prefixed + .target( + name: "it_ProcessIsolated_escalatingWorkers", + dependencies: [ + "DistributedActors", + ], + path: "IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_escalatingWorkers" + ), .target( name: "it_ProcessIsolated_respawnsServants", dependencies: [ @@ -78,6 +85,13 @@ let targets: [PackageDescription.Target] = [ ], path: "IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_noLeaking" ), + .target( + name: "it_ProcessIsolated_backoffRespawn", + dependencies: [ + "DistributedActors", + ], + path: "IntegrationTests/tests_02_process_isolated/it_ProcessIsolated_backoffRespawn" + ), // ==== ---------------------------------------------------------------------------------------------------------------- // MARK: Performance / Benchmarks diff --git a/Sources/DistributedActors/ActorContext.swift b/Sources/DistributedActors/ActorContext.swift index 75f4752f5..9bdb4c87c 100644 --- a/Sources/DistributedActors/ActorContext.swift +++ b/Sources/DistributedActors/ActorContext.swift @@ -199,7 +199,10 @@ public class ActorContext: ActorRefFactory { /// Spawn a child actor and start watching it to get notified about termination. /// - /// - SeeAlso: `spawn` and `watch`. + /// For a detailed explanation of the both concepts refer to the `spawn` and `watch` documentation. + /// + /// - SeeAlso: `spawn` + /// - SeeAlso: `watch` public func spawnWatch(_ naming: ActorNaming, of type: M.Type = M.self, props: Props = Props(), _ behavior: Behavior) throws -> ActorRef { return undefined() } diff --git a/Sources/DistributedActors/ActorRefProvider.swift b/Sources/DistributedActors/ActorRefProvider.swift index 286eee4bc..6c444d11c 100644 --- a/Sources/DistributedActors/ActorRefProvider.swift +++ b/Sources/DistributedActors/ActorRefProvider.swift @@ -25,6 +25,11 @@ internal protocol _ActorRefProvider: _ActorTreeTraversable { /// Spawn an actor with the passed in [Behavior] and return its [ActorRef]. /// /// The returned actor ref is immediately valid and may have messages sent to. + /// + /// ### Lack of `.spawnWatch` on top level + /// Note that it is not possible to `.spawnWatch` top level actors. // TODO this is on purpose, + /// since any stop would mean the system shutdown if you really want this then implement your own top actor to spawn children. + /// It is possible however to use .supervision(strategy: .escalate)) as failures bubbling up through the system may indeed be a reason to terminate. func spawn( system: ActorSystem, behavior: Behavior, address: ActorAddress, @@ -140,10 +145,9 @@ internal struct LocalActorRefProvider: _ActorRefProvider { dispatcher: dispatcher ) - let refWithCell = actor._myCell - - refWithCell.sendSystemMessage(.start) + let cell = actor._myCell + cell.sendSystemMessage(.start) return actor } } diff --git a/Sources/DistributedActors/ActorShell.swift b/Sources/DistributedActors/ActorShell.swift index 56907709c..17bea65f2 100644 --- a/Sources/DistributedActors/ActorShell.swift +++ b/Sources/DistributedActors/ActorShell.swift @@ -138,11 +138,11 @@ internal final class ActorShell: ActorContext, AbstractActor { self.supervisor = Supervision.supervisorFor(system, initialBehavior: behavior, props: props.supervision) - if let failureDetectorRef = system._cluster?._nodeDeathWatcher { - self._deathWatch = DeathWatch(failureDetectorRef: failureDetectorRef) + if let nodeDeathWatcher = system._nodeDeathWatcher { + self._deathWatch = DeathWatch(nodeDeathWatcher: nodeDeathWatcher) } else { // FIXME; we could see if `myself` is the right one actually... rather than dead letters; if we know the FIRST actor ever is the failure detector one? - self._deathWatch = DeathWatch(failureDetectorRef: system.deadLetters.adapted()) + self._deathWatch = DeathWatch(nodeDeathWatcher: system.deadLetters.adapted()) } self.namingContext = ActorNamingContext() @@ -312,9 +312,23 @@ internal final class ActorShell: ActorContext, AbstractActor { case .terminated(let ref, let existenceConfirmed, let nodeTerminated): let terminated = Signals.Terminated(address: ref.address, existenceConfirmed: existenceConfirmed, nodeTerminated: nodeTerminated) try self.interpretTerminatedSignal(who: ref, terminated: terminated) - case .childTerminated(let ref): - let terminated = Signals.ChildTerminated(address: ref.address, error: nil) // TODO: what about the errors - try self.interpretChildTerminatedSignal(who: ref, terminated: terminated) + + case .childTerminated(let ref, let circumstances): + switch circumstances { + // escalation takes precedence over death watch in terms of how we report errors + case .escalating(let failure): + // we only populate `escalation` if the child is escalating + let terminated = Signals.ChildTerminated(address: ref.address, escalation: failure) + try self.interpretChildTerminatedSignal(who: ref, terminated: terminated) + + case .stopped: + let terminated = Signals.ChildTerminated(address: ref.address, escalation: nil) + try self.interpretChildTerminatedSignal(who: ref, terminated: terminated) + case .failed: + let terminated = Signals.ChildTerminated(address: ref.address, escalation: nil) + try self.interpretChildTerminatedSignal(who: ref, terminated: terminated) + } + case .nodeTerminated(let remoteNode): self.interpretNodeTerminated(remoteNode) @@ -357,7 +371,7 @@ internal final class ActorShell: ActorContext, AbstractActor { internal var continueRunning: Bool { switch self.behavior.underlying { case .suspended: return false - case .stop: return self.children.nonEmpty + case .stop, .failed: return self.children.nonEmpty default: return true } } @@ -377,22 +391,30 @@ internal final class ActorShell: ActorContext, AbstractActor { /// We only FORCE the sending of a tombstone if we know we have parked the thread because an actual failure happened, /// thus this run *will never complete* and we have to make sure that we run the cleanup that the tombstone causes. /// This means that while the current thread is parked forever, we will enter the mailbox with another last run (!), to process the cleanups. + @usableFromInline internal func fail(_ error: Error) { self._myCell.mailbox.setFailed() self.behavior = self.behavior.fail(cause: .error(error)) - // TODO: we could handle here "wait for children to terminate" - // we only finishTerminating() here and not right away in message handling in order to give the Mailbox - // a chance to react to the problem as well; I.e. 1) we throw 2) mailbox sets terminating 3) we get fail() 4) we REALLY terminate switch error { case DeathPactError.unhandledDeathPact(_, _, let message): self.log.error("\(message)") // TODO: configurable logging? in props? default: - self.log.error("Actor threw error, reason: [\(error)]:\(type(of: error)). Terminating.") // TODO: configurable logging? in props? + self.log.warning("Actor threw error, reason: [\(error)]:\(type(of: error)). Terminating.") // TODO: configurable logging? in props? } } + @usableFromInline + internal func _escalate(failure: Supervision.Failure) -> Behavior { + self.behavior = self.behavior.fail(cause: failure) + + // FIXME: should not be needed since we'll signal when we finishTerminating + // self._parent.ref.sendSystemMessage(.childTerminated(ref: self.asAddressable, .escalating(failure)), file: #file, line: #line) + + return self.behavior + } + /// Similar to `fail` however assumes that the current mailbox run will never complete, which can happen when we crashed, /// and invoke this function from a signal handler. public func reportCrashFail(cause: MessageProcessingFailure) { @@ -486,8 +508,8 @@ internal final class ActorShell: ActorContext, AbstractActor { // note that even though the parent can (and often does) `watch(child)`, we filter it out from // our `watchedBy` set, since otherwise we would have to filter it out when sending the terminated back. // correctness is ensured though, since the parent always receives the `ChildTerminated`. - self.notifyParentWeDied() - self.notifyWatchersWeDied() + self.notifyParentOfTermination() + self.notifyWatchersOfTermination() self.invokePendingDeferredClosuresWhileTerminating() @@ -504,6 +526,8 @@ internal final class ActorShell: ActorContext, AbstractActor { // become stopped, if not already switch self.behavior.underlying { + case .failed(_, let failure): + self.behavior = .stop(reason: .failure(failure)) case .stop(_, let reason): self.behavior = .stop(reason: reason) default: @@ -517,15 +541,26 @@ internal final class ActorShell: ActorContext, AbstractActor { } // Implementation note: bridge method so Mailbox can call this when needed - func notifyWatchersWeDied() { + func notifyWatchersOfTermination() { traceLog_DeathWatch("NOTIFY WATCHERS WE ARE DEAD self: \(self.address)") self.deathWatch.notifyWatchersWeDied(myself: self.myself) } - func notifyParentWeDied() { + func notifyParentOfTermination() { let parent: AddressableActorRef = self._parent traceLog_DeathWatch("NOTIFY PARENT WE ARE DEAD, myself: [\(self.address)], parent [\(parent.address)]") - parent.sendSystemMessage(.childTerminated(ref: self.myself.asAddressable())) + + guard case .failed(_, let failure) = self.behavior.underlying else { + // we are not failed, so no need to further check for .escalate supervision + return parent.sendSystemMessage(.childTerminated(ref: self.myself.asAddressable(), .stopped)) + } + + guard self.supervisor is EscalatingSupervisor else { + // NOT escalating + return parent.sendSystemMessage(.childTerminated(ref: self.myself.asAddressable(), .failed(failure))) + } + + parent.sendSystemMessage(.childTerminated(ref: self.myself.asAddressable(), .escalating(failure))) } func invokePendingDeferredClosuresWhileTerminating() { @@ -662,7 +697,7 @@ extension ActorShell { switch next.underlying { case .unhandled: throw DeathPactError.unhandledDeathPact(terminated: deadRef, myself: self.myself.asAddressable(), - message: "Death Pact error: [\(self.address)] has not handled [Terminated] signal received from watched [\(deadRef)] actor. " + + message: "DeathPactError: Unhandled [\(terminated)] signal about watched actor [\(deadRef.address)]. " + "Handle the `.terminated` signal in `.receiveSignal()` in order react to this situation differently than termination.") default: try self.becomeNext(behavior: next) // FIXME: make sure we don't drop the behavior...? @@ -674,7 +709,8 @@ extension ActorShell { /// Results in signaling `Terminated` for all of the locally watched actors on the (now terminated) node. /// This action is performed concurrently by all actors who have watched remote actors on given node, /// and no ordering guarantees are made about which actors will get the Terminated signals first. - @inlinable internal func interpretNodeTerminated(_ terminatedNode: UniqueNode) { + @inlinable + internal func interpretNodeTerminated(_ terminatedNode: UniqueNode) { #if SACT_TRACE_ACTOR_SHELL self.log.info("Received address terminated: \(terminatedNode)") #endif @@ -682,12 +718,14 @@ extension ActorShell { self.deathWatch.receiveNodeTerminated(terminatedNode, myself: self.asAddressable) } - @inlinable internal func interpretStop() throws { + @inlinable + internal func interpretStop() throws { self.children.stopAll() try self.becomeNext(behavior: .stop(reason: .stopByParent)) } - @inlinable internal func interpretChildTerminatedSignal(who terminatedRef: AddressableActorRef, terminated: Signals.ChildTerminated) throws { + @inlinable + internal func interpretChildTerminatedSignal(who terminatedRef: AddressableActorRef, terminated: Signals.ChildTerminated) throws { #if SACT_TRACE_ACTOR_SHELL self.log.info("Received \(terminated)") #endif @@ -711,9 +749,31 @@ extension ActorShell { // TODO: we always want to call "through" the supervisor, make it more obvious that that should be the case internal API wise? next = try self.supervisor.interpretSupervised(target: self.behavior, context: self, signal: terminated) } else { - // no signal handling installed is semantically equivalent to unhandled - // log.debug("No .signalHandling installed, yet \(message) arrived; Assuming .unhandled") - next = Behavior.unhandled + switch terminated.escalation { + case .some(let failure): + // the child actor decided to `.escalate` the error and thus we are notified about it + // escalation differs from plain termination that by default it DOES cause us to crash as well, + // causing a chain reaction of crashing until someone handles or the guardian receives it and shuts down the system. + self.log.warning("Failure escalated by [\(terminatedRef.path)] reached non-watching, non-signal handling parent, escalation will continue! Failure was: \(failure)") + + next = try self.supervisor.interpretSupervised(target: .signalHandling( + handleMessage: self.behavior, + handleSignal: { _, _ in + switch failure { + case .error(let error): + throw error + case .fault(let errorRepr): + throw errorRepr + } + } + ), context: self, signal: terminated) + + case .none: + // the child actor has stopped without providing us with a reason // FIXME; does this need to carry manual stop as a reason? + // + // no signal handling installed is semantically equivalent to unhandled + next = Behavior.unhandled + } } try self.becomeNext(behavior: next) diff --git a/Sources/DistributedActors/ActorSystem.swift b/Sources/DistributedActors/ActorSystem.swift index 949327a19..07bb4d02e 100644 --- a/Sources/DistributedActors/ActorSystem.swift +++ b/Sources/DistributedActors/ActorSystem.swift @@ -91,8 +91,10 @@ public final class ActorSystem { // initialized during startup internal var _cluster: ClusterShell? - internal var _clusterEventStream: EventStream? + internal var _clusterEvents: EventStream? + internal var _nodeDeathWatcher: NodeDeathWatcherShell.Ref? + // ==== ---------------------------------------------------------------------------------------------------------------- // MARK: Logging public var log: Logger { @@ -199,8 +201,16 @@ public final class ActorSystem { do { // Cluster MUST be the last thing we initialize, since once we're bound, we may receive incoming messages from other nodes if let cluster = self._cluster { - self._clusterEventStream = try! EventStream(self, name: "clusterEvents") + let clusterEvents = try! EventStream(self, name: "clusterEvents") + self._clusterEvents = clusterEvents _ = try cluster.start(system: self, eventStream: self.clusterEvents) // only spawns when cluster is initialized + + // Node watcher MUST be started AFTER cluster and clusterEvents + self._nodeDeathWatcher = try self._spawnSystemActor( + NodeDeathWatcherShell.naming, + NodeDeathWatcherShell.behavior(clusterEvents: clusterEvents), + perpetual: true + ) } } catch { fatalError("Failed while starting cluster subsystem! Error: \(error)") @@ -291,30 +301,26 @@ extension ActorSystem: ActorRefFactory { /// - throws: when the passed behavior is not a legal initial behavior /// - throws: when the passed actor name contains illegal characters (e.g. symbols other than "-" or "_") public func spawn(_ naming: ActorNaming, of type: Message.Type = Message.self, props: Props = Props(), _ behavior: Behavior) throws -> ActorRef { - return try self._spawnUserActor(naming, behavior, props: props) - } - - internal func _spawnUserActor(_ naming: ActorNaming, _ behavior: Behavior, props: Props = Props()) throws -> ActorRef { - return try self._spawnActor(using: self.userProvider, behavior, name: naming, props: props) + return try self._spawn(using: self.userProvider, behavior, name: naming, props: props) } // Implementation note: - // `isWellKnown` here means that the actor always exists and must be addressable without receiving a reference / path to it. This is for example necessary + // `perpetual` here means that the actor always exists and must be addressable without receiving a reference / path to it. This is for example necessary // to discover the receptionist actors on all nodes in order to replicate state between them. The incarnation of those actors will be `ActorIncarnation.perpetual`. This // also means that there will only be one instance of that actor that will stay alive for the whole lifetime of the system. Appropriate supervision strategies // should be configured for these types of actors. internal func _spawnSystemActor(_ naming: ActorNaming, _ behavior: Behavior, props: Props = Props(), perpetual: Bool = false) throws -> ActorRef { - return try self._spawnActor(using: self.systemProvider, behavior, name: naming, props: props, isWellKnown: perpetual) + return try self._spawn(using: self.systemProvider, behavior, name: naming, props: props, isWellKnown: perpetual) } // Actual spawn implementation, minus the leading "$" check on names; - // spawnInternal is used by `spawn(.anonymous)` and others, which are privileged and may start with "$" - internal func _spawnActor(using provider: _ActorRefProvider, _ behavior: Behavior, name naming: ActorNaming, props: Props = Props(), isWellKnown: Bool = false) throws -> ActorRef { + internal func _spawn(using provider: _ActorRefProvider, _ behavior: Behavior, name naming: ActorNaming, props: Props = Props(), isWellKnown: Bool = false) throws -> ActorRef { try behavior.validateAsInitial() let incarnation: ActorIncarnation = isWellKnown ? .perpetual : .random() // TODO: lock inside provider, not here + // FIXME: protect the naming context access and name reservation; add a test let address: ActorAddress = try self.withNamingContext { namingContext in let name = naming.makeName(&namingContext) @@ -332,7 +338,7 @@ extension ActorSystem: ActorRefFactory { case .nio(let group): dispatcher = NIOEventLoopGroupDispatcher(group) default: - fatalError("not implemented yet, only default dispatcher and calling thread one work") + fatalError("selected dispacher [\(props.dispatcher)] not implemented yet; ") // FIXME: remove any not implemented ones simply from API } return try provider.spawn( diff --git a/Sources/DistributedActors/ActorSystemSettings.swift b/Sources/DistributedActors/ActorSystemSettings.swift index e02228e4c..804e82e3a 100644 --- a/Sources/DistributedActors/ActorSystemSettings.swift +++ b/Sources/DistributedActors/ActorSystemSettings.swift @@ -37,6 +37,7 @@ public struct ActorSystemSettings { public var actor: ActorSettings = .default public var serialization: SerializationSettings = .default public var metrics: MetricsSettings = .default(rootName: nil) + public var failure: FailureSettings = .default public var cluster: ClusterSettings = .default { didSet { self.serialization.localNode = self.cluster.uniqueBindNode @@ -50,11 +51,37 @@ public struct ActorSystemSettings { public var threadPoolSize: Int = ProcessInfo.processInfo.activeProcessorCount } -public struct ActorSettings { - public static var `default`: ActorSettings { - return .init() - } +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: Failure Settings + +public struct FailureSettings { + public static let `default` = FailureSettings() + + /// Determines what action should be taken when a failure is escalated to a top level guardian (e.g. `/user` or `/system). + public var onGuardianFailure: GuardianFailureHandling = .shutdownActorSystem +} + +/// Configures what guardians should do when an error reaches them. +/// (Guardians are the top level actors, e.g. `/user` or `/system`). +public enum GuardianFailureHandling { + /// Shut down the actor system when an error is escalated to a guardian. + case shutdownActorSystem - // arbitrarily selected, we protect start() using it; we may lift this restriction if needed - public var maxBehaviorNestingDepth: Int = 128 + /// Immediately exit the process when an error is escalated to a guardian. + /// Best used with `ProcessIsolated` mode. + case systemExit(Int) +} + +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: Actor Settings + +extension ActorSystemSettings { + public struct ActorSettings { + public static var `default`: ActorSettings { + return .init() + } + + // arbitrarily selected, we protect start() using it; we may lift this restriction if needed + public var maxBehaviorNestingDepth: Int = 128 + } } diff --git a/Sources/DistributedActors/Backoff.swift b/Sources/DistributedActors/Backoff.swift index 2a72ff2f4..b49f8ba5c 100644 --- a/Sources/DistributedActors/Backoff.swift +++ b/Sources/DistributedActors/Backoff.swift @@ -28,6 +28,8 @@ public protocol BackoffStrategy { mutating func reset() } +// TODO: make nicer for auto completion? (.constant) etc + // ==== ---------------------------------------------------------------------------------------------------------------- // MARK: Backoff Strategy implementations @@ -44,7 +46,7 @@ public enum Backoff { /// Backoff each time using the same, constant, time amount. /// /// See `ConstantBackoffStrategy` for details - static func constant(_ backoff: TimeAmount) -> ConstantBackoffStrategy { + public static func constant(_ backoff: TimeAmount) -> ConstantBackoffStrategy { return .init(timeAmount: backoff) } @@ -63,7 +65,7 @@ public enum Backoff { /// MUST be `>= initialInterval`. /// - randomFactor: A random factor of `0.5` results in backoffs between 50% below and 50% above the base interval. /// MUST be between: `<0; 1>` (inclusive) - static func exponential( + public static func exponential( initialInterval: TimeAmount = ExponentialBackoffStrategy.Defaults.initialInterval, multiplier: Double = ExponentialBackoffStrategy.Defaults.multiplier, maxInterval: TimeAmount = ExponentialBackoffStrategy.Defaults.capInterval, diff --git a/Sources/DistributedActors/Cluster/ActorSystem+Cluster.swift b/Sources/DistributedActors/Cluster/ActorSystem+Cluster.swift index fa6d1a32a..f249681d2 100644 --- a/Sources/DistributedActors/Cluster/ActorSystem+Cluster.swift +++ b/Sources/DistributedActors/Cluster/ActorSystem+Cluster.swift @@ -61,6 +61,6 @@ extension ActorSystem { } internal var clusterEvents: EventStream { - return self._clusterEventStream ?? EventStream(ref: self.deadLetters.adapted()) + return self._clusterEvents ?? EventStream(ref: self.deadLetters.adapted()) } } diff --git a/Sources/DistributedActors/Cluster/ClusterShell.swift b/Sources/DistributedActors/Cluster/ClusterShell.swift index 2b538568d..745a048e3 100644 --- a/Sources/DistributedActors/Cluster/ClusterShell.swift +++ b/Sources/DistributedActors/Cluster/ClusterShell.swift @@ -93,21 +93,6 @@ internal class ClusterShell { return it } - // ==== ------------------------------------------------------------------------------------------------------------ - // MARK: Node-Death Watcher - - // Implementation notes: The `_failureDetectorRef` has to remain internally accessible. - // This is in order to solve a chicken-and-egg problem that we face during spawning of - // the first system actor that is the *failure detector* so it cannot reach to the systems - // value before it started... - var _nodeDeathWatcher: NodeDeathWatcherShell.Ref? - var nodeDeathWatcher: NodeDeathWatcherShell.Ref { - guard let it = self._nodeDeathWatcher else { - return fatalErrorBacktrace("Accessing ClusterShell.nodeDeathWatcher failed, was nil! This should never happen as access should only happen after start() was invoked.") - } - return it - } - init() { self._associationsLock = Lock() self._associationsRegistry = [:] @@ -116,21 +101,13 @@ internal class ClusterShell { // the single thing in the class it will modify is the associations registry, which we do to avoid actor queues when // remote refs need to obtain those // - // TODO: see if we can restructure this to avoid these nil/then-set dance + // FIXME: see if we can restructure this to avoid these nil/then-set dance self._ref = nil - self._nodeDeathWatcher = nil } /// Actually starts the shell which kicks off binding to a port, and all further cluster work internal func start(system: ActorSystem, eventStream: EventStream) throws -> ClusterShell.Ref { self._serializationPool = try SerializationPool(settings: .default, serialization: system.serialization) - - self._nodeDeathWatcher = try system._spawnSystemActor( - NodeDeathWatcherShell.naming, - NodeDeathWatcherShell.behavior(), - perpetual: true - ) - self._events = eventStream // TODO: concurrency... lock the ref as others may read it? @@ -155,7 +132,7 @@ internal class ClusterShell { } // this is basically our API internally for this system - enum CommandMessage: NoSerializationVerification { + enum CommandMessage: NoSerializationVerification, SilentDeadLetter { case join(Node) case handshakeWith(Node, replyTo: ActorRef?) @@ -197,7 +174,7 @@ internal class ClusterShell { private var props: Props = Props() - .addingSupervision(strategy: .stop) // always fail completely (may revisit this) // TODO: Escalate + .supervision(strategy: .escalate) // always fail completely } // ==== ---------------------------------------------------------------------------------------------------------------- @@ -236,8 +213,6 @@ extension ClusterShell { ) // TODO: configurable bind timeout? - - // TODO: crash everything, entire system, when bind fails return context.awaitResultThrowing(of: chanElf, timeout: .milliseconds(300)) { (chan: Channel) in context.log.info("Bound to \(chan.localAddress.map { $0.description } ?? "")") @@ -269,6 +244,7 @@ extension ClusterShell { return self.onReachabilityChange(context, state: state, node: node, reachability: reachability) case .unbind(let receptacle): + // TODO: should become shutdown return self.unbind(context, state: state, signalOnceUnbound: receptacle) case .downCommand(let node): @@ -279,7 +255,7 @@ extension ClusterShell { func receiveQuery(context: ActorContext, query: QueryMessage) -> Behavior { switch query { case .associatedNodes(let replyTo): - replyTo.tell(state.associatedAddresses()) // TODO: we'll want to put this into some nicer message wrapper? + replyTo.tell(state.associatedNodes()) // TODO: we'll want to put this into some nicer message wrapper? return .same case .currentMembership(let replyTo): replyTo.tell(state.membership) @@ -330,7 +306,7 @@ extension ClusterShell { if let existingAssociation = state.association(with: remoteNode) { // TODO: we maybe could want to attempt and drop the other "old" one? - state.log.warning("Attempted associating with already associated node: [\(remoteNode)], existing association: [\(existingAssociation)]") + state.log.debug("Attempted associating with already associated node: [\(remoteNode)], existing association: [\(existingAssociation)]") switch existingAssociation { case .associated(let associationState): replyTo?.tell(.success(associationState.remoteNode)) @@ -470,7 +446,7 @@ extension ClusterShell { case .initiated(var initiated): switch initiated.onHandshakeError(error) { case .scheduleRetryHandshake(let delay): - state.log.info("Schedule handshake retry to: [\(initiated.remoteNode)] delay: [\(delay)]") + state.log.debug("Schedule handshake retry to: [\(initiated.remoteNode)] delay: [\(delay)]") context.timers.startSingle( key: TimerKey("handshake-timer-\(remoteNode)"), message: .command(.retryHandshake(initiated)), @@ -504,7 +480,7 @@ extension ClusterShell { var state = state // local copy for mutation guard let completed = state.incomingHandshakeAccept(accept) else { - if state.associatedAddresses().contains(accept.from) { + if state.associatedNodes().contains(accept.from) { // this seems to be a re-delivered accept, we already accepted association with this node. return .ignore } else { @@ -620,7 +596,8 @@ extension ClusterShell { var state = state if let change = state.onMembershipChange(node, toStatus: .down) { - self.nodeDeathWatcher.tell(.forceDown(change.node)) + // self.nodeDeathWatcher.tell(.forceDown(change.node)) + self._events.publish(.membership(.memberDown(Member(node: change.node, status: .down)))) if let logChangeLevel = state.settings.logMembershipChanges { context.log.log(level: logChangeLevel, "Cluster membership change: \(reflecting: change), membership: \(state.membership)") @@ -700,27 +677,3 @@ extension ClusterShell { } } } - -// ==== ---------------------------------------------------------------------------------------------------------------- -// MARK: ActorSystem extensions - -extension ActorSystem { - internal var clusterShell: ActorRef { - return self._cluster?.ref ?? self.deadLetters.adapt(from: ClusterShell.Message.self) - } - - // TODO: not sure how to best expose, but for now this is better than having to make all internal messages public. - public func join(node: Node) { - self.clusterShell.tell(.command(.join(node))) - } - - // TODO: not sure how to best expose, but for now this is better than having to make all internal messages public. - public func _dumpAssociations() { - let ref: ActorRef> = try! self.spawn(.anonymous, .receive { context, nodes in - let stringlyNodes = nodes.map { String(reflecting: $0) }.joined(separator: "\n ") - context.log.info("~~~~ ASSOCIATED NODES ~~~~~\n \(stringlyNodes)") - return .stop - }) - self.clusterShell.tell(.query(.associatedNodes(ref))) - } -} diff --git a/Sources/DistributedActors/Cluster/ClusterShellState.swift b/Sources/DistributedActors/Cluster/ClusterShellState.swift index 59033d5b3..004cd460f 100644 --- a/Sources/DistributedActors/Cluster/ClusterShellState.swift +++ b/Sources/DistributedActors/Cluster/ClusterShellState.swift @@ -79,7 +79,7 @@ internal struct ClusterShellState: ReadOnlyClusterState { return self._associations[node] } - func associatedAddresses() -> Set { + func associatedNodes() -> Set { var set: Set = .init(minimumCapacity: self._associations.count) for asm in self._associations.values { diff --git a/Sources/DistributedActors/Cluster/NodeDeathWatcher.swift b/Sources/DistributedActors/Cluster/NodeDeathWatcher.swift index cd9277766..b210c7dc6 100644 --- a/Sources/DistributedActors/Cluster/NodeDeathWatcher.swift +++ b/Sources/DistributedActors/Cluster/NodeDeathWatcher.swift @@ -126,21 +126,30 @@ enum NodeDeathWatcherShell { /// it would be possible however to allow implementing the raw protocol by user actors if we ever see the need for it. internal enum Message { case remoteActorWatched(watcher: AddressableActorRef, remoteNode: UniqueNode) - case membershipSnapshot(Membership) - case membershipChange(MembershipChange) - case forceDown(UniqueNode) // TODO: this should go away with cluster events landing + case membershipSnapshot(Membership) // TODO: remove? + case membershipChange(MembershipChange) // TODO: remove as well } - static func behavior() -> Behavior { + static func behavior(clusterEvents: EventStream) -> Behavior { return .setup { context in - // WARNING: DO NOT TOUCH context.system.cluster; we are started potentially before the cluster (!) let instance = NodeDeathWatcherInstance(selfNode: context.system.settings.cluster.uniqueBindNode) + + context.system.cluster.events.subscribe(context.subReceive(ClusterEvent.self) { event in + switch event { + case .membership(.memberDown(let member)): + let change = MembershipChange(node: member.node, fromStatus: .none, toStatus: .down) + instance.handleAddressDown(change) + default: + () // ignore other changes, we only need to react on nodes becoming DOWN + } + }) + return NodeDeathWatcherShell.behavior(instance) } } static func behavior(_ instance: NodeDeathWatcherInstance) -> Behavior { - return .receive { _, message in + return .receiveMessage { message in let lastMembership: Membership = .empty // TODO: To be mutated based on membership changes @@ -157,11 +166,6 @@ enum NodeDeathWatcherShell { case .membershipChange(let change): _ = instance.onMembershipChanged(change) // TODO: return and interpret directives - - case .forceDown(let node): - // TODO: we'd get the change from subscribing to events and applying to local membership - let change = MembershipChange(node: node, fromStatus: .none, toStatus: .down) - instance.handleAddressDown(change) } return .same } diff --git a/Sources/DistributedActors/DeadLetters.swift b/Sources/DistributedActors/DeadLetters.swift index d01379ca4..cd74f9b40 100644 --- a/Sources/DistributedActors/DeadLetters.swift +++ b/Sources/DistributedActors/DeadLetters.swift @@ -225,6 +225,9 @@ final class DeadLetterOffice { // are inherently racy in the during actor system shutdown: let ignored = recipient == ActorAddress._cluster return ignored +// case .terminated, .childTerminated: +// // we ignore terminated messages in dead letter logging, as those are often harmless side effects of "everyone is shutting down" +// return true default: // ignore other messages, no special handling needed return false @@ -232,6 +235,11 @@ final class DeadLetterOffice { } } +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: Silent Dead Letter marker + +protocol SilentDeadLetter {} + // ==== ---------------------------------------------------------------------------------------------------------------- // MARK: Paths diff --git a/Sources/DistributedActors/DeathWatch.swift b/Sources/DistributedActors/DeathWatch.swift index 175cf6817..1f8a4f193 100644 --- a/Sources/DistributedActors/DeathWatch.swift +++ b/Sources/DistributedActors/DeathWatch.swift @@ -27,14 +27,14 @@ import NIO // Implementation notes: // Care was taken to keep this implementation separate from the ActorCell however not require more storage space. @usableFromInline -internal struct DeathWatch { // TODO: may want to change to a protocol +internal struct DeathWatch { private var watching = Set() private var watchedBy = Set() - private var failureDetectorRef: NodeDeathWatcherShell.Ref + private var nodeDeathWatcher: NodeDeathWatcherShell.Ref - init(failureDetectorRef: NodeDeathWatcherShell.Ref) { - self.failureDetectorRef = failureDetectorRef + init(nodeDeathWatcher: NodeDeathWatcherShell.Ref) { + self.nodeDeathWatcher = nodeDeathWatcher } // MARK: perform watch/unwatch @@ -160,7 +160,7 @@ internal struct DeathWatch { // TODO: may want to change to a protocol private func subscribeNodeTerminatedEvents(myself: ActorRef, node: UniqueNode?) { if let remoteNode = node { - self.failureDetectorRef.tell(.remoteActorWatched(watcher: AddressableActorRef(myself), remoteNode: remoteNode)) + self.nodeDeathWatcher.tell(.remoteActorWatched(watcher: AddressableActorRef(myself), remoteNode: remoteNode)) } } } diff --git a/Sources/DistributedActors/Mailbox.swift b/Sources/DistributedActors/Mailbox.swift index ea4dff74e..a4191cbe3 100644 --- a/Sources/DistributedActors/Mailbox.swift +++ b/Sources/DistributedActors/Mailbox.swift @@ -176,7 +176,7 @@ internal final class Mailbox { }, fail: { [weak _shell = shell, path = self.address.path] error in traceLog_Mailbox(_shell?.path, "FAIL THE MAILBOX") switch _shell { - case .some(let cell): cell.fail(error) + case .some(let shell): shell.fail(error) case .none: pprint("Mailbox(\(path)) TRIED TO FAIL ON AN ALREADY DEAD CELL") } }) @@ -192,7 +192,7 @@ internal final class Mailbox { }, fail: { [weak _shell = shell, path = self.address.path] error in traceLog_Mailbox(_shell?.path, "FAIL THE MAILBOX") switch _shell { - case .some(let cell): cell.fail(error) + case .some(let shell): shell.fail(error) case .none: pprint("\(path) TRIED TO FAIL ON AN ALREADY DEAD CELL") } }) @@ -368,8 +368,6 @@ internal final class Mailbox { failedMessagePtr.initialize(to: nil) defer { failedMessagePtr.deallocate() } - var runPhase: SActMailboxRunPhase = .processingSystemMessages - // Run the mailbox: let mailboxRunResult: SActMailboxRunResult = cmailbox_run(mailbox, &cell, diff --git a/Sources/DistributedActors/ProcessIsolated/POSIXProcessUtils.swift b/Sources/DistributedActors/ProcessIsolated/POSIXProcessUtils.swift index 063cc35f9..fd5bbd2bd 100644 --- a/Sources/DistributedActors/ProcessIsolated/POSIXProcessUtils.swift +++ b/Sources/DistributedActors/ProcessIsolated/POSIXProcessUtils.swift @@ -68,6 +68,8 @@ internal enum POSIXProcessUtils { socketpair(AF_UNIX, Int32(SOCK_STREAM.rawValue), 0, &taskSocketPair) #endif + // ==== closing fds ------------------------------------------------ + // We close all file descriptors in the child process. posix_spawn_file_actions_init(&childFDActions) // closing fds ------------ diff --git a/Sources/DistributedActors/ProcessIsolated/ProcessCommander.swift b/Sources/DistributedActors/ProcessIsolated/ProcessCommander.swift index 1b53ae39c..6769e041e 100644 --- a/Sources/DistributedActors/ProcessIsolated/ProcessCommander.swift +++ b/Sources/DistributedActors/ProcessIsolated/ProcessCommander.swift @@ -14,22 +14,25 @@ /// EXPERIMENTAL. // Master (Process) and Commander (Actor): The Far Side of the World -public struct ProcessCommander { - public static let naming: ActorNaming = "processCommander" +internal struct ProcessCommander { public static let name: String = "processCommander" + public static let naming: ActorNaming = .unique(name) - public enum Command { + internal enum Command { case requestSpawnServant(ServantProcessSupervisionStrategy, args: [String]) -// case checkOnServantProcesses + case requestRespawnServant(ServantProcess, delay: TimeAmount?) } - private let funRemoveServantPid: (Int) -> Void + private let funRemoveServantByPID: (Int) -> Void private let funSpawnServantProcess: (ServantProcessSupervisionStrategy, [String]) -> Void + private let funRespawnServantProcess: (ServantProcess) -> Void - public init(funSpawnServantProcess: @escaping (ServantProcessSupervisionStrategy, [String]) -> Void, - funKillServantProcess: @escaping (Int) -> Void) { + init(funSpawnServantProcess: @escaping (ServantProcessSupervisionStrategy, [String]) -> Void, + funRespawnServantProcess: @escaping (ServantProcess) -> Void, + funKillServantProcess: @escaping (Int) -> Void) { self.funSpawnServantProcess = funSpawnServantProcess - self.funRemoveServantPid = funKillServantProcess + self.funRespawnServantProcess = funRespawnServantProcess + self.funRemoveServantByPID = funKillServantProcess } private var _servants: [Int: ServantProcess] = [:] @@ -46,30 +49,28 @@ public struct ProcessCommander { } var running: Behavior { - return .receive { context, message in - switch message { - case .requestSpawnServant(let supervision, let args): - context.log.info("Spawning new servant process; Supervision \(supervision), arguments: \(args)") - self.funSpawnServantProcess(supervision, args) - return .same + return .setup { context in + var _spawnServantTimerId = 0 + func nextSpawnServantTimerKey() -> TimerKey { + _spawnServantTimerId += 1 + return "spawnServant-\(_spawnServantTimerId)" + } -// case .checkOnServantProcesses: -// let res = POSIXProcessUtils.nonBlockingWaitPID(pid: 0) -// if res.pid > 0 { -// let node = self.lock.withLock { -// self._servants.removeValue(forKey: res.pid) -// } -// -// if let node = node { -// system.log.warning("Servant process died [\(res)], node: [\(node)]; Issuing a forced DOWN command.") -// self.system.cluster._shell.tell(.command(.down(node.node))) -// } -// -// // TODO spawn replacement configurable -// self.control.requestSpawnServant(args: []) -// -// return .same -// } + return .receiveMessage { message in + switch message { + case .requestSpawnServant(let supervision, let args): + context.log.info("Spawning new servant process; Supervision \(supervision), arguments: \(args)") + self.funSpawnServantProcess(supervision, args) + + case .requestRespawnServant(let servant, .some(let delay)): + context.log.info("Scheduling spawning of new servant process in [\(delay.prettyDescription)]; Servant to be replaced: \(servant), in \(delay.prettyDescription)") + context.timers.startSingle(key: nextSpawnServantTimerKey(), message: .requestRespawnServant(servant, delay: nil), delay: delay) + case .requestRespawnServant(let servant, .none): + // restart immediately + context.log.info("Spawning replacement servant process; Supervision \(servant.supervisionStrategy), arguments: \(servant.args)") + self.funRespawnServantProcess(servant) + } + return .same } } } diff --git a/Sources/DistributedActors/ProcessIsolated/ProcessIsolated+Supervision.swift b/Sources/DistributedActors/ProcessIsolated/ProcessIsolated+Supervision.swift new file mode 100644 index 000000000..389318afb --- /dev/null +++ b/Sources/DistributedActors/ProcessIsolated/ProcessIsolated+Supervision.swift @@ -0,0 +1,108 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Distributed Actors open source project +// +// Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +// Licensed under Apache License v2.0 +// +// See LICENSE.txt for license information +// See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +// +// SPDX-License-Identifier: Apache-2.0 +// +//===----------------------------------------------------------------------===// + +import DistributedActorsConcurrencyHelpers + +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: Servant Supervision + +/// Configures supervision for a specific servant process. +/// +/// Similar to `SupervisionStrategy` (which is for actors), however in effect for servant processes. +/// +/// - SeeAlso: `SupervisionStrategy` for detailed documentation on supervision and timing semantics. +public struct ServantProcessSupervisionStrategy { + internal let underlying: SupervisionStrategy + + /// Stopping supervision strategy, meaning that terminated servant processes will not get automatically spawned replacements. + /// + /// It is useful if you want to manually manage replacements and servant processes, however note that without restarting + /// servants, the system may end up in a state with no servants, and only the master running, so you should plan to take + /// action in case this happens (e.g. by terminating the master itself, and relying on a higher level orchestrator to restart + /// the entire system). + public static var stop: ServantProcessSupervisionStrategy { + return .init(underlying: .stop) + } + + /// Supervision strategy binding the lifecycle of the master process with the given servant process, + /// i.e. if a servant process supervised using this strategy terminates (exits, fails, for whatever reason), + /// the master parent will also terminate (with an error exit code). + public static var escalate: ServantProcessSupervisionStrategy { + return .init(underlying: .escalate) + } + + /// The respawn strategy allows the supervised servant process to be restarted `atMost` times `within` a time period. + /// In addition, each subsequent restart _may_ be performed after a certain backoff. + /// + /// ### Servant Respawn vs. Actor Restart semantics + /// While servant `.respawn` supervision may, on the surface, seem identical to `restart` supervision of actors, + /// it differs in one crucial aspect: supervising actors with `.restart` allows them to retain the existing mailbox + /// and create a new instance of the initial behavior to continue serving the same mailbox, i.e. only a single message is lost upon restart. + /// A respawned servant process sadly cannot guarantee this, and all mailboxes and state in the servant process is lost, including all mailboxes, + /// thus explaining the slightly different naming and semantics implications of this supervision strategy. + /// + /// - SeeAlso: The actor `SupervisionStrategy` documentation, which explains the exact semantics of this supervision mechanism in-depth. + /// + /// - parameter atMost: number of attempts allowed restarts within a single failure period (defined by the `within` parameter. MUST be > 0). + /// - parameter within: amount of time within which the `atMost` failures are allowed to happen. This defines the so called "failure period", + /// which runs from the first failure encountered for `within` time, and if more than `atMost` failures happen in this time amount then + /// no restart is performed and the failure is escalated (and the actor terminates in the process). + /// - parameter backoff: strategy to be used for suspending the failed actor for a given (backoff) amount of time before completing the restart. + public static func respawn(atMost: Int, within: TimeAmount?, backoff: BackoffStrategy? = nil) -> ServantProcessSupervisionStrategy { + return .init(underlying: .restart(atMost: atMost, within: within, backoff: backoff)) + } +} + +extension ProcessIsolated { + func monitorServants() { + let res = POSIXProcessUtils.nonBlockingWaitPID(pid: 0) + if res.pid > 0 { + let maybeServant = self.removeServant(pid: res.pid) + + guard var servant = maybeServant else { + self.system.log.warning("Unknown PID died, ignoring... PID was: \(res.pid)") + return + } + + // always DOWN the node that we know has terminated + self.system.cluster.down(node: servant.node) + // TODO: we could aggressively tell other nodes about the down rather rely on the gossip...? + + // if we have a restart supervision logic, we should apply it. + guard let decision = servant.recordFailure() else { + self.system.log.info("Servant \(servant.node) (pid:\(res.pid)) has no supervision / restart strategy defined.") + return + } + + let messagePrefix = "Servant [\(servant.node) @ pid:\(res.pid)] supervision" + switch decision { + case .stop: + self.system.log.info("\(messagePrefix): STOP, as decided by: \(servant.restartLogic, orElse: ""); Servant process will not be respawned.") + + case .escalate: + self.system.log.info("\(messagePrefix): ESCALATE, as decided by: \(servant.restartLogic, orElse: "")") + self.system.cluster.down(node: self.system.cluster.node) + // TODO: ensure we exit the master process as well + + case .restartImmediately: + self.system.log.info("\(messagePrefix): RESPAWN, as decided by: \(servant.restartLogic, orElse: "")") + self.control.requestServantRestart(servant, delay: nil) + + case .restartBackoff(let delay): + self.system.log.info("\(messagePrefix): RESPAWN BACKOFF, as decided by: \(servant.restartLogic, orElse: "")") + self.control.requestServantRestart(servant, delay: delay) + } + } + } +} diff --git a/Sources/DistributedActors/ProcessIsolated/ProcessIsolated.swift b/Sources/DistributedActors/ProcessIsolated/ProcessIsolated.swift index 48bf67ca1..7c4a2c20d 100644 --- a/Sources/DistributedActors/ProcessIsolated/ProcessIsolated.swift +++ b/Sources/DistributedActors/ProcessIsolated/ProcessIsolated.swift @@ -89,19 +89,25 @@ public class ProcessIsolated { bootSettings.settings.cluster.node = node } + if role == .servant { + bootSettings.settings.failure.onGuardianFailure = .systemExit(-1) + } let system = boot(bootSettings) - system.log.info("Configured ProcessIsolated(\(role), pid: \(getpid())), parent pid: \(POSIXProcessUtils.getParentPID()), with arguments: \(arguments)") + system.log.info("Configured ProcessIsolated(\(role), pid: \(getpid())), parentPID: \(POSIXProcessUtils.getParentPID()), with arguments: \(arguments)") self.control = IsolatedControl(system: system, roles: [role], masterNode: system.settings.cluster.uniqueBindNode) self.system = system self._lastAssignedServantPort = system.settings.cluster.node.port - if role.is("master") { + if role.is(.master) { let funSpawnServantProcess: (ServantProcessSupervisionStrategy, [String]) -> Void = { (supervision: ServantProcessSupervisionStrategy, args: [String]) in self.spawnServantProcess(supervision: supervision, args: args) } + let funRespawnServantProcess: (ServantProcess) -> Void = { (servant: ServantProcess) in + self.respawnServantProcess(servant) + } let funKillServantProcess: (Int) -> Void = { (pid: Int) in self.lock.withLockVoid { if let servant = self._servants[pid] { @@ -115,6 +121,7 @@ public class ProcessIsolated { let processCommander = ProcessCommander( funSpawnServantProcess: funSpawnServantProcess, + funRespawnServantProcess: funRespawnServantProcess, funKillServantProcess: funKillServantProcess ) self.processCommander = try! system._spawnSystemActor(ProcessCommander.naming, processCommander.behavior, perpetual: true) @@ -167,7 +174,7 @@ public class ProcessIsolated { /// /// ### Thread safety /// Thread safe, can be invoked from any thread (and any node, managed by the `ProcessIsolated` launcher) - public func spawnServantProcess(supervision: ServantProcessSupervisionStrategy, args: [String]) { + public func spawnServantProcess(supervision: ServantProcessSupervisionStrategy, args: [String] = []) { if self.control.hasRole(.master) { self.processSupervisorMailbox.enqueue(.spawnServant(supervision, args: args)) } else { @@ -176,7 +183,15 @@ public class ProcessIsolated { } } - // FIXME: this does not work have tests yet. + internal func respawnServantProcess(_ servant: ServantProcess, delay: TimeAmount? = nil) { + if self.control.hasRole(.master) { + self.processSupervisorMailbox.enqueue(.respawnServant(servant)) + } else { + // we either send like this, or we allow only the master to do this (can enforce getting a ref to spawnServant) + self.processCommander.tell(.requestRespawnServant(servant, delay: delay)) + } + } + /// Requests the spawning of a new servant process. /// In order for this to work, the master process MUST be running `blockAndSuperviseServants`. /// @@ -188,36 +203,53 @@ public class ProcessIsolated { } } - func removeServantPID(_ pid: Int) { - self.lock.withLockVoid { + /// + /// ### Thread safety + /// Thread safe, can be invoked from any thread (and any node, managed by the `ProcessIsolated` launcher) + internal func removeServant(pid: Int) -> ServantProcess? { + return self.lock.withLock { self._servants.removeValue(forKey: pid) } } +} + +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: Role +extension ProcessIsolated { /// Role that a process isolated process can fulfil. /// Used by `isolated.runOn(role: ) public struct Role: Hashable, CustomStringConvertible { - let name: String + public let name: String init(_ name: String) { self.name = name } - func `is`(_ name: String) -> Bool { + public func `is`(_ name: String) -> Bool { return self.name == name } + public func `is`(_ role: Role) -> Bool { + return self == role + } + public var description: String { return "Role(\(self.name))" } } } +// ==== ---------------------------------------------------------------------------------------------------------------- +// MARK: ServantProcess + +/// Servant process representation owned by the supervising Master Process. +/// May be mutated when applying supervision decisions. internal struct ServantProcess { - let node: UniqueNode - let args: [String] + var node: UniqueNode + var args: [String] let supervisionStrategy: ServantProcessSupervisionStrategy - let restartLogic: RestartDecisionLogic? + var restartLogic: RestartDecisionLogic? init(node: UniqueNode, args: [String], supervisionStrategy: ServantProcessSupervisionStrategy) { self.node = node @@ -227,140 +259,108 @@ internal struct ServantProcess { switch supervisionStrategy.underlying { case .restart(let atMost, let within, let backoffStrategy): self.restartLogic = RestartDecisionLogic(maxRestarts: atMost, within: within, backoffStrategy: backoffStrategy) + case .escalate: + self.restartLogic = nil case .stop: self.restartLogic = nil } } -} - -// ==== ---------------------------------------------------------------------------------------------------------------- -// MARK: Servant Supervision -/// Configures supervision for a specific su -/// -/// Similar to `SupervisionStrategy` (which is for actors), however in effect for servant processes. -/// -/// - SeeAlso: `SupervisionStrategy` for detailed documentation on supervision and timing semantics. -public struct ServantProcessSupervisionStrategy { - fileprivate let underlying: SupervisionStrategy - - /// Stopping supervision strategy, meaning that terminated servant processes will not get automatically spawned replacements. - /// It is useful if you want to manually manage replacements and servant processes, however note that without restarting - /// servants, the system may end up in a state with no servants, and only the master running, so you should plan to take - /// action in case this happens (e.g. by terminating the master itself, and relying on a higher level orchestrator to restart - /// the entire system). - public static var stop: ServantProcessSupervisionStrategy { - return .init(underlying: .stop) + var command: String { + return self.args.first! // TODO: or safer somehow? } - /// The restarting strategy allows the supervised servant process to be restarted `atMost` times `within` a time period. - /// In addition, each subsequent restart _may_ be performed after a certain backoff. - /// - /// - SeeAlso: The actor `SupervisionStrategy` documentation, which explains the exact semantics of this supervision mechanism in-depth. - /// - /// - parameter atMost: number of attempts allowed restarts within a single failure period (defined by the `within` parameter. MUST be > 0). - /// - parameter within: amount of time within which the `atMost` failures are allowed to happen. This defines the so called "failure period", - /// which runs from the first failure encountered for `within` time, and if more than `atMost` failures happen in this time amount then - /// no restart is performed and the failure is escalated (and the actor terminates in the process). - /// - parameter backoff: strategy to be used for suspending the failed actor for a given (backoff) amount of time before completing the restart. - public static func restart(atMost: Int, within: TimeAmount?, backoff: BackoffStrategy? = nil) -> ServantProcessSupervisionStrategy { - return .init(underlying: .restart(atMost: atMost, within: within, backoff: backoff)) + /// Record a failure of the servant process, and decide if we should restart (spawn a replacement) it or not. + // TODO: should we reuse this supervision decision or use a new type; "restart" implies not losing the mailbox... here we DO lose mailboxes..." WDYT? + mutating func recordFailure() -> SupervisionDecision? { + if let decision = self.restartLogic?.recordFailure() { + return decision + } else { + return nil + } } } internal enum _ProcessSupervisorMessage { case spawnServant(ServantProcessSupervisionStrategy, args: [String]) + case respawnServant(ServantProcess) } extension ProcessIsolated { // Effectively, this is a ProcessFailureDetector internal func processMasterLoop() { - func monitorServants() { - let res = POSIXProcessUtils.nonBlockingWaitPID(pid: 0) - if res.pid > 0 { - let maybeServant = self.lock.withLock { - self._servants.removeValue(forKey: res.pid) - } - - guard let servant = maybeServant else { - // TODO: unknown PID died? - self.system.log.warning("Unknown PID died, ignoring... PID was: \(res.pid)") - return - } - - // always DOWN the node that we know has terminated - self.system.cluster.down(node: servant.node) - - // if we have a restart supervision logic, we should apply it. - guard var restartLogic = servant.restartLogic else { - self.system.log.info("Servant \(servant.node) (pid:\(res.pid)) has no supervision / restart strategy defined, NO replacement servant will be spawned in its place.") - return - } - - let messagePrefix = "Servant process [\(servant.node) @ pid:\(res.pid)] supervision" - switch restartLogic.recordFailure() { - case .stop: - self.system.log.info("\(messagePrefix): STOP, as decided by: \(restartLogic)") - case .escalate: - self.system.log.info("\(messagePrefix): ESCALATE, as decided by: \(restartLogic)") - case .restartImmediately: - self.system.log.info("\(messagePrefix): RESTART, as decided by: \(restartLogic)") - self.control.requestSpawnServant(supervision: servant.supervisionStrategy, args: servant.args) - case .restartBackoff: - // TODO: implement backoff for process isolated - fatalError("\(messagePrefix): BACKOFF NOT IMPLEMENTED YET") - } - } - } - while true { - monitorServants() + self.monitorServants() guard let message = self.processSupervisorMailbox.poll(.milliseconds(300)) else { continue // spin again } - self.receive(message) + // TODO: check for the self system to be terminating or not + + guard self.receive(message) else { + break + } } } - private func receive(_ message: _ProcessSupervisorMessage) { + private func receive(_ message: _ProcessSupervisorMessage) -> Bool { guard self.control.hasRole(.master) else { - return + return false } switch message { case .spawnServant(let supervision, let args): - let port = self.nextServantPort() - let nid = NodeID.random() + let node = self.makeServantNode() + + guard let command = CommandLine.arguments.first else { + fatalError("Unable to extract first argument of command line arguments (which is expected to be the application name); Args: \(CommandLine.arguments)") + } - let node = UniqueNode(systemName: "SERVANT", host: "127.0.0.1", port: port, nid: nid) + var effectiveArgs: [String] = [] + effectiveArgs.append(command) + effectiveArgs.append(KnownServantParameters.role.render(value: ProcessIsolated.Role.servant.name)) + effectiveArgs.append(KnownServantParameters.port.render(value: "\(node.port)")) + effectiveArgs.append(KnownServantParameters.masterNode.render(value: String(reflecting: self.system.settings.cluster.uniqueBindNode))) + effectiveArgs.append(contentsOf: args) let servant = ServantProcess( node: node, - args: args, + args: effectiveArgs, supervisionStrategy: supervision ) - guard let command = CommandLine.arguments.first else { - fatalError("Unable to extract first argument of command line arguments (which is expected to be the application name); Args: \(CommandLine.arguments)") + do { + let pid = try POSIXProcessUtils.spawn(command: servant.command, args: servant.args) + self.storeServant(pid: pid, servant: servant) + } catch { + self.system.log.error("Unable to spawn servant; Error: \(error)") } + return true - var args: [String] = [] - args.append(command) - args.append(KnownServantParameters.role.render(value: ProcessIsolated.Role.servant.name)) - args.append(KnownServantParameters.port.render(value: "\(port)")) - args.append(KnownServantParameters.masterNode.render(value: String(reflecting: self.system.settings.cluster.uniqueBindNode))) - args.append(contentsOf: args) + case .respawnServant(let terminated): + var replacement = terminated + + let replacementNode = self.makeServantNode() + replacement.node = replacementNode do { - let pid = try POSIXProcessUtils.spawn(command: command, args: args) - self.storeServant(pid: pid, servant: servant) + let pid = try POSIXProcessUtils.spawn(command: replacement.command, args: replacement.args) + self.storeServant(pid: pid, servant: replacement) } catch { - self.system.log.error("Unable to spawn servant; Error: \(error)") + self.system.log.error("Unable to restart servant [terminated: \(terminated)]; Error: \(error)") } + return true } } + + private func makeServantNode() -> UniqueNode { + let port = self.nextServantPort() + let nid = NodeID.random() + + let node = UniqueNode(systemName: "SERVANT", host: "127.0.0.1", port: port, nid: nid) + return node + } } enum KnownServantParameters { @@ -449,6 +449,7 @@ public final class IsolatedControl { self.masterNode = masterNode } + /// Request spawning a new servant process. func requestSpawnServant(supervision: ServantProcessSupervisionStrategy, args: [String] = []) { precondition(self.hasRole(.master), "Only 'master' process can spawn servants. Was: \(self)") @@ -456,6 +457,17 @@ public final class IsolatedControl { self.system._resolve(context: context).tell(.requestSpawnServant(supervision, args: args)) } + /// Requests starting a replacement of given servant. + /// + /// Such restart does NOT preserve existing mailboxes of actors that lived in the given servant process, + /// they are lost forever. + func requestServantRestart(_ servant: ServantProcess, delay: TimeAmount?) { + precondition(self.hasRole(.master), "Only 'master' process can spawn servants. Was: \(self)") + + let context = ResolveContext(address: ActorAddress.ofProcessMaster(on: self.masterNode), system: self.system) + self.system._resolve(context: context).tell(.requestRespawnServant(servant, delay: delay)) + } + public func hasRole(_ role: ProcessIsolated.Role) -> Bool { return self.roles.contains(role) } diff --git a/Sources/DistributedActors/Props.swift b/Sources/DistributedActors/Props.swift index 11ff34971..7de51e41a 100644 --- a/Sources/DistributedActors/Props.swift +++ b/Sources/DistributedActors/Props.swift @@ -15,7 +15,7 @@ import NIO // ==== ---------------------------------------------------------------------------------------------------------------- -// MARK: Actor Props +// MARK: Props /// `Props` configure an Actors' properties such as mailbox, dispatcher as well as supervision semantics. /// diff --git a/Sources/DistributedActors/Refs.swift b/Sources/DistributedActors/Refs.swift index 83d0dadce..ad16edba6 100644 --- a/Sources/DistributedActors/Refs.swift +++ b/Sources/DistributedActors/Refs.swift @@ -409,13 +409,13 @@ internal class Guardian { @usableFromInline func trySendUserMessage(_ message: Any, file: String = #file, line: UInt = #line) { - self.deadLetters.tell(DeadLetter(message, recipient: self.address)) + self.deadLetters.tell(DeadLetter(message, recipient: self.address), file: file, line: line) } @usableFromInline func sendSystemMessage(_ message: SystemMessage, file: String = #file, line: UInt = #line) { switch message { - case .childTerminated(let ref): + case .childTerminated(let ref, let circumstances): self._childrenLock.synchronized { _ = self._children.removeChild(identifiedBy: ref.address) // if we are stopping and all children have been stopped, @@ -424,6 +424,36 @@ internal class Guardian { self.allChildrenRemoved.signalAll() } } + + switch circumstances { + case .escalating(let failure): + guard let system = self.system else { + // TODO: What else to do here? print to stderr? we are likely already shutting down or already shut down.") + return + } + switch system.settings.failure.onGuardianFailure { + case .shutdownActorSystem: + let message = "Escalated failure from [\(ref.address)] reached top-level guardian [\(self.address.path)], shutting down ActorSystem! Failure was: \(failure)" + system.log.error("\(message)", metadata: ["actorPath": "\(self.address.path)"]) + print(message) // TODO: to stderr + + _ = try! Thread { + system.shutdown() // so we don't block anyone who sent us this signal (as we execute synchronously in the guardian) + } + case .systemExit(let code): + let message = "Escalated failure from [\(ref.address)] reached top-level guardian [\(self.address.path)], exiting process (\(code))! Failure was: \(failure)" + system.log.error("\(message)", metadata: ["actorPath": "\(self.address.path)"]) + print(message) // TODO: to stderr + + POSIXProcessUtils._exit(Int32(code)) + } + + case .failed: + () // ignore, we only react to escalations + + case .stopped: + () // ignore, we only react to escalations + } default: CDistributedActorsMailbox.sact_dump_backtrace() fatalError("The \(self.address) actor MUST NOT receive any messages. Yet received \(message); Sent at \(file):\(line)") diff --git a/Sources/DistributedActors/Signals.swift b/Sources/DistributedActors/Signals.swift index 1d54369c0..5d964a156 100644 --- a/Sources/DistributedActors/Signals.swift +++ b/Sources/DistributedActors/Signals.swift @@ -62,8 +62,10 @@ public enum Signals { /// Signal sent to all watchers of an actor once the `watchee` has terminated. /// + /// The actual reason for the terminated message being sent may vary from the actor terminating, to the entire `Node` + /// hosting this actor having been marked as `.down` and thus any actors residing on it have to be assumed terminated. + /// /// - SeeAlso: `ChildTerminated` which is sent specifically to a parent-actor once its child has terminated. - /// - Warning: Do not inherit, as termination as well-defined and very specific meaning. public class Terminated: Signal, CustomStringConvertible { /// Address of the terminated actor. public let address: ActorAddress @@ -88,13 +90,42 @@ public enum Signals { } /// Signal sent to a parent actor when an actor it has spawned, i.e. its child, has terminated. + /// Upon processing this signal, the parent MAY choose to spawn another child with the _same_ name as the now terminated child -- + /// a guarantee which is not enjoyed by watching actors from any other actor. + /// + /// This signal is sent to the parent _always_, i.e. both for the child stopping naturally as well as failing. + /// + /// ### Death Pacts with Children /// - /// This signal is sent and can be handled regardless if the child was watched (using `context.watch()`) or not. /// If the child is NOT being watched by the parent, this signal will NOT cause the parent (recipient of this signal) /// to kill kill itself by throwing an [DeathPactError], as this is reserved only to when a death pact is formed. /// In other words, if the parent spawns child actors but does not watch them, this is taken as not caring enough about /// their lifetime as to trigger termination itself if one of them terminates. /// + /// ### Failure Escalation + /// + /// It is possible, because of the special relationship parent-child actors enjoy, to spawn a child actor using the + /// `.escalate` strategy, which means that if the child fails, it will populate the `escalation` failure reason of + /// the `ChildTerminated` signal. Propagating failure reasons is not supported through `watch`-ed actors, and is only + /// available to parent-child pairs. + /// + /// This `escalation` failure can be used by the parent to manually decide if it should also fail, spawn a replacement child, + /// or perform any other action. Not that spawning another actor in response to `ChildTerminated` means losing + /// the child's mailbox; unlike using the `.restart` supervision strategy, which keeps the mailbox, but instantiates + /// a new instance of the child behavior. + /// + /// It is NOT recommended to perform deep inspection of the escalated failure to perform complex logic, however it + /// may be used to determine if a specific error is "very bad" or "not bad enough" and we should start a replacement child. + /// + /// #### "Bubbling-up" Escalated Failures + /// + /// Escalated failures which are not handled will cause the parent to crash as well (!). + /// This enables spawning a hierarchy of actors, all of which use the `.escalate` strategy, meaning that the entire + /// section of the tree will be torn down upon failure of one of the workers. A higher level supervisor may then decide to + /// restart one of the higher actors, causing a "sub tree" to be restarted in response to a worker failure. Alternatively, + /// this pattern is useful when one wants to bubble up failures all the way to the guardian actors (`/user`, or `/system`), + /// in which case the system will issue a configured termination action (see `ActorSystemSettings.guardianFailureHandling`). + /// /// - Note: Note that `ChildTerminated` IS-A `Terminated` so unless you need to specifically react to a child terminating, /// you may choose to handle all `Terminated` signals the same way. /// @@ -104,17 +135,17 @@ public enum Signals { /// This kind of information is only known to the parent, which may decide to perform /// some action based on the error, i.e. proactively stop other children or spawn another worker /// targeting a different resource URI (e.g. if error indicates that the previously used resource is too busy). - public let cause: Error? + public let escalation: Supervision.Failure? - public init(address: ActorAddress, error: Error?) { - self.cause = error + public init(address: ActorAddress, escalation: Supervision.Failure?) { + self.escalation = escalation super.init(address: address, existenceConfirmed: true) } public override var description: String { let reason: String - if case .some(let r) = self.cause { - reason = ", cause: \(r)" + if case .some(let r) = self.escalation { + reason = ", escalation: \(r)" } else { reason = "" } diff --git a/Sources/DistributedActors/Supervision.swift b/Sources/DistributedActors/Supervision.swift index ef81c127c..1da9f9985 100644 --- a/Sources/DistributedActors/Supervision.swift +++ b/Sources/DistributedActors/Supervision.swift @@ -49,9 +49,9 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forErrorType: error type selector, determining for what type of error the given supervisor should perform its logic. - static func addingSupervision(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) -> Props { + static func supervision(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) -> Props { var props = Props() - props.addSupervision(strategy: strategy, forErrorType: errorType) + props.supervise(strategy: strategy, forErrorType: errorType) return props } @@ -62,8 +62,8 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forAll: failure type selector, working as a "catch all" for the specific types of failures. - static func addingSupervision(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) -> Props { - return self.addingSupervision(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) + static func supervision(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) -> Props { + return self.supervision(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) } /// Creates a new `Props` appending an supervisor for the selected `Error` type, useful for setting a few options in-line when spawning actors. @@ -73,9 +73,9 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forErrorType: error type selector, determining for what type of error the given supervisor should perform its logic. - func addingSupervision(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) -> Props { + func supervision(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) -> Props { var props = self - props.addSupervision(strategy: strategy, forErrorType: errorType) + props.supervise(strategy: strategy, forErrorType: errorType) return props } @@ -86,8 +86,8 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forAll: failure type selector, working as a "catch all" for the specific types of failures. - func addingSupervision(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) -> Props { - return self.addingSupervision(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) + func supervision(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) -> Props { + return self.supervision(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) } /// Adds another supervisor for the selected `Error` type to the chain of existing supervisors in this `Props`. @@ -97,7 +97,7 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forErrorType: failure type selector, working as a "catch all" for the specific types of failures. - mutating func addSupervision(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) { + mutating func supervise(strategy: SupervisionStrategy, forErrorType errorType: Error.Type) { self.supervision.add(strategy: strategy, forErrorType: errorType) } @@ -108,8 +108,8 @@ public extension Props { /// - Parameters: /// - strategy: supervision strategy to apply for the given class of failures /// - forAll: failure type selector, working as a "catch all" for the specific types of failures. - mutating func addSupervision(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) { - self.addSupervision(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) + mutating func supervise(strategy: SupervisionStrategy, forAll selector: Supervise.All = .failures) { + self.supervise(strategy: strategy, forErrorType: Supervise.internalErrorTypeFor(selector: selector)) } } @@ -189,6 +189,27 @@ public enum SupervisionStrategy { /// The actor's mailbox remains untouched by default, and it would continue processing it from where it left off before the crash; /// the message which caused a failure is NOT processed again. For retrying processing of such higher level mechanisms should be used. case restart(atMost: Int, within: TimeAmount?, backoff: BackoffStrategy?) // TODO: would like to remove the `?` and model more properly + + /// WARNING: Purposefully ESCALATES the failure to the parent of the spawned actor, even if it has not watched the child. + /// + /// This allows for constructing "fail the parent" even if the parent is not under our control. + /// This method should not be over used, as normally the parent should decide by itself if it wants to stop + /// or spawn a replacement child or something else, however sometimes it is useful to allow providers of props + /// to configure a parent to be torn down when a specific child dies. E.g. when providing workers to a pool, + /// and we want to enforce the pool dying if only a single child (or a special one) terminates. + /// + /// ### Escalating to guardians + /// Root guardians, such as `/user` or `/system` take care of spawning children when `system.spawn` is invoked. + /// These guardians normally do not care for the termination of their children, as the `stop` supervision strategy + /// instructs them to. By spawning a top-level actor, e.g. under the `/user`-guardian and passing in the `.escalate` + /// strategy, it is possible to escalate failures to the guardians, which in turn will cause the system to terminate. + /// + /// This strategy is useful whenever the failure of some specific actor should be considered "fatal to the actor system", + /// yet we still want to perform a graceful shutdown, rather than an abrupt one (e.g. by calling `exit()`). + /// + /// #### Inter-op with `ProcessIsolated` + /// It is worth pointing out, that escalating failures to root guardians + case escalate } public extension SupervisionStrategy { @@ -237,6 +258,8 @@ public struct Supervision { case .restart(let atMost, let within, let backoffStrategy): let strategy = RestartDecisionLogic(maxRestarts: atMost, within: within, backoffStrategy: backoffStrategy) return RestartingSupervisor(initialBehavior: initialBehavior, restartLogic: strategy, failureType: failureType) + case .escalate: + return EscalatingSupervisor(failureType: failureType) case .stop: return StoppingSupervisor(failureType: failureType) } @@ -338,7 +361,9 @@ public enum Supervise { } internal enum AllErrors: Error {} + internal enum AllFaults: Error {} + internal enum AllFailures: Error {} } @@ -470,9 +495,9 @@ internal class Supervisor { repeat { switch processingAction { case .closure(let closure): - context.log.warning("Actor has THROWN [\(errorToHandle)]:\(type(of: errorToHandle)) while interpreting [closure defined at \(closure.location)] , handling with \(self)") + context.log.warning("Actor has THROWN [\(errorToHandle)]:\(type(of: errorToHandle)) while interpreting [closure defined at \(closure.location)], handling with \(self.descriptionForLogs)") default: - context.log.warning("Actor has THROWN [\(errorToHandle)]:\(type(of: errorToHandle)) while interpreting \(processingAction), handling with \(self)") + context.log.warning("Actor has THROWN [\(errorToHandle)]:\(type(of: errorToHandle)) while interpreting \(processingAction), handling with \(self.descriptionForLogs)") } let directive: Directive @@ -489,8 +514,7 @@ internal class Supervisor { return .stop(reason: .failure(.error(error))) case .escalate(let failure): - // TODO: this is not really escalating (yet) - return .stop(reason: .failure(failure)) + return context._downcastUnsafe._escalate(failure: failure) case .restartImmediately(let replacement): try context._downcastUnsafe._restartPrepare() @@ -498,7 +522,6 @@ internal class Supervisor { case .restartDelayed(let delay, let replacement): try context._downcastUnsafe._restartPrepare() - return SupervisionRestartDelayedBehavior.after(delay: delay, with: replacement) } } catch { @@ -531,6 +554,10 @@ internal class Supervisor { func isSame(as other: Supervisor) -> Bool { return undefined() } + + var descriptionForLogs: String { + return "\(type(of: self))" + } } /// Supervisor equivalent to not having supervision enabled, since stopping is the default behavior of failing actors. @@ -544,12 +571,12 @@ final class StoppingSupervisor: Supervisor { } override func handleFailure(_ context: ActorContext, target: Behavior, failure: Supervision.Failure, processingType: ProcessingType) throws -> SupervisionDirective { - guard failure.shouldBeHandled(bySupervisorHandling: self.failureType) else { + if failure.shouldBeHandled(bySupervisorHandling: self.failureType) { // TODO: matters perhaps only for metrics where we'd want to "please count this specific type of error" so leaving this logic as-is return .stop + } else { + return .stop } - - return .stop } override func isSame(as other: Supervisor) -> Bool { @@ -563,6 +590,43 @@ final class StoppingSupervisor: Supervisor { override func canHandle(failure: Supervision.Failure) -> Bool { return failure.shouldBeHandled(bySupervisorHandling: self.failureType) } + + override var descriptionForLogs: String { + return "[.stop] supervision strategy" + } +} + +/// Escalates failure to parent, while failing the current actor. +final class EscalatingSupervisor: Supervisor { + internal let failureType: Error.Type + + internal init(failureType: Error.Type) { + self.failureType = failureType + } + + override func handleFailure(_ context: ActorContext, target: Behavior, failure: Supervision.Failure, processingType: ProcessingType) throws -> SupervisionDirective { + if failure.shouldBeHandled(bySupervisorHandling: self.failureType) { + return .escalate(failure) + } else { + return .stop + } + } + + override func isSame(as other: Supervisor) -> Bool { + if let other = other as? EscalatingSupervisor { + return self.failureType == other.failureType + } else { + return false + } + } + + override func canHandle(failure: Supervision.Failure) -> Bool { + return failure.shouldBeHandled(bySupervisorHandling: self.failureType) + } + + override var descriptionForLogs: String { + return "[.escalate<\(self.failureType)>] supervision strategy" + } } // There are a few ways we could go about implementing this, we currently do a simple scan for "first one that handles", @@ -589,7 +653,9 @@ final class CompositeSupervisor: Supervisor { } override func canHandle(failure: Supervision.Failure) -> Bool { - return self.supervisors.contains { $0.canHandle(failure: failure) } + return self.supervisors.contains { + $0.canHandle(failure: failure) + } } } @@ -663,14 +729,15 @@ internal struct RestartDecisionLogic { if let backoffAmount = self.backoffStrategy?.next() { return .restartBackoff(delay: backoffAmount) } else { - // TODO: or plain stop? now they are the same though... - // we stop/escalate since the strategy decided we've been trying again enough and it is time to stop - return .escalate + // we stop since the strategy decided we've been trying again enough and it is time to stop + // TODO: could be configurable to escalate once restarts exhausted + return .stop } } else { // e.g. total time within which we are allowed to back off has been exceeded etc - return .escalate + // TODO: could be configurable to escalate once restarts exhausted + return .stop } } @@ -747,6 +814,10 @@ final class RestartingSupervisor: Supervisor { public override func isSame(as other: Supervisor) -> Bool { return other is RestartingSupervisor } + + override var descriptionForLogs: String { + return "[.restart(\(self.restartDecider))] supervision strategy" + } } /// Behavior used to suspend after a `restartPrepare` has been issued by an `restartDelayed`. diff --git a/Sources/DistributedActors/SystemMessages.swift b/Sources/DistributedActors/SystemMessages.swift index 2bd1f0d28..170516513 100644 --- a/Sources/DistributedActors/SystemMessages.swift +++ b/Sources/DistributedActors/SystemMessages.swift @@ -58,7 +58,8 @@ internal enum SystemMessage: Equatable { case terminated(ref: AddressableActorRef, existenceConfirmed: Bool, addressTerminated: Bool) // TODO: more additional info? // TODO: send terminated PATH, not ref, sending to it does not make sense after all /// Child actor has terminated. This system message by itself does not necessarily cause a DeathPact and termination of the parent. - case childTerminated(ref: AddressableActorRef) + /// If the message carries an `escalated` failure, the failure should apply to the parent as well, potentially tearing it down as well. + case childTerminated(ref: AddressableActorRef, TerminationCircumstances) /// Node has terminated, and all actors of this node shall be considered as terminated. /// This system message does _not_ have a direct counter part as `Signal`, and instead results in the sending of multiple @@ -86,6 +87,20 @@ internal enum SystemMessage: Equatable { case tombstone } +/// The circumstances under which a child actor has terminated. +public enum TerminationCircumstances { + /// The actor stopped naturally, by becoming `.stop` + case stopped + /// The actor has failed during message processing. + case failed(Supervision.Failure) + /// The actor has failed and requests to escalate this failure. + /// Even if the parent did not watch the child, this failure should be taken as one that the parent is at least partially responsible for. + /// If nothing else, the parent may want to "bubble up" the failure either by throwing or if it was configured with `SupervisionStrategy.escalate` itself. + /// + /// Escalating takes precedence over `.failed`, in case the child was both watched and configured with `.escalate` supervision. + case escalating(Supervision.Failure) +} + internal extension SystemMessage { @inlinable static func terminated(ref: AddressableActorRef) -> SystemMessage { @@ -112,10 +127,13 @@ extension SystemMessage { return lWatchee.address == rWatchee.address && lWatcher.address == rWatcher.address case (.unwatch(let lWatchee, let lWatcher), .unwatch(let rWatchee, let rWatcher)): return lWatchee.address == rWatchee.address && lWatcher.address == rWatcher.address - case (.terminated(let lRef, let lExisted, let lAddrTerminated), .terminated(let rRef, let rExisted, let rAddrTerminated)): - return lRef.address == rRef.address && lExisted == rExisted && lAddrTerminated == rAddrTerminated - case (.childTerminated(let lPath), .childTerminated(let rPath)): - return lPath.address == rPath.address + + case (.terminated(let lRef, let lExisted, let lNodeTerminated), .terminated(let rRef, let rExisted, let rNodeTerminated)): + return lRef.address == rRef.address && lExisted == rExisted && lNodeTerminated == rNodeTerminated + + case (.childTerminated(let lRef, _), .childTerminated(let rRef, _)): + return lRef.address == rRef.address // enough since address is an unique identifier + case (.nodeTerminated(let lAddress), .nodeTerminated(let rAddress)): return lAddress == rAddress diff --git a/Sources/DistributedActorsTestKit/ActorTestKit.swift b/Sources/DistributedActorsTestKit/ActorTestKit.swift index c1c122dd8..942915528 100644 --- a/Sources/DistributedActorsTestKit/ActorTestKit.swift +++ b/Sources/DistributedActorsTestKit/ActorTestKit.swift @@ -64,7 +64,7 @@ public struct ActorTestKitSettings { public extension ActorTestKit { /// Spawn an `ActorTestProbe` which offers various assertion methods for actor messaging interactions. - func spawnTestProbe(name naming: ActorNaming? = nil, expecting type: M.Type = M.self, file: StaticString = #file, line: UInt = #line) -> ActorTestProbe { + func spawnTestProbe(_ naming: ActorNaming? = nil, expecting type: M.Type = M.self, file: StaticString = #file, line: UInt = #line) -> ActorTestProbe { self.spawnProbesLock.lock() defer { self.spawnProbesLock.unlock() } // we want to use our own sequence number for the naming here, so we make it here rather than let the @@ -89,6 +89,7 @@ public extension ActorTestKit { } } +// ==== ---------------------------------------------------------------------------------------------------------------- // MARK: Eventually public extension ActorTestKit { diff --git a/Tests/DistributedActorsDocumentationTests/DeathWatchDocExamples.swift b/Tests/DistributedActorsDocumentationTests/DeathWatchDocExamples.swift new file mode 100644 index 000000000..e433d639e --- /dev/null +++ b/Tests/DistributedActorsDocumentationTests/DeathWatchDocExamples.swift @@ -0,0 +1,87 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Distributed Actors open source project +// +// Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +// Licensed under Apache License v2.0 +// +// See LICENSE.txt for license information +// See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +// +// SPDX-License-Identifier: Apache-2.0 +// +//===----------------------------------------------------------------------===// + +// tag::imports[] + +import DistributedActors + +// end::imports[] + +struct Player { + typealias Command = String +} + +struct GameUnit { + enum Command { + case player(ActorRef) + case otherCommand + } +} + +struct GameMatch { + enum Command { + case playerConnected(ActorRef) + case disconnectedPleaseStop + } +} + +class DeathWatchDocExamples { + func unitReady() -> Behavior { + return .ignore + } + + func simple_watch() throws { + // tag::simple_death_watch[] + func gameUnit(player: ActorRef) -> Behavior { + return .setup { context in + context.watch(player) // <1> + + return .receiveMessage { _ in // <2> + // perform some game logic... + .same + } // <3> + } + } + // end::simple_death_watch[] + } + + func schedule_event() throws { + func isPlayer(_: Any) -> Bool { + return false + } + // tag::handling_termination_deathwatch[] + let concedeTimer: TimerKey = "concede-timer" + + Behavior.receive { context, command in + switch command { + case .playerConnected(let player): + context.timers.cancel(for: concedeTimer) + context.watch(player) + return .same + + case .disconnectedPleaseStop: + context.log.info("Stopping since player remained not connected for a while...") + return .stop + } + }.receiveSpecificSignal(Signals.Terminated.self) { context, terminated in + guard isPlayer(terminated) else { + return .unhandled + } + + context.timers.startSingle(key: concedeTimer, message: .disconnectedPleaseStop, delay: .seconds(1)) + return .same + } + // end::handling_termination_deathwatch[] + } +} diff --git a/Tests/DistributedActorsDocumentationTests/ProcessIsolatedDocExamples.swift b/Tests/DistributedActorsDocumentationTests/ProcessIsolatedDocExamples.swift new file mode 100644 index 000000000..28147ea2c --- /dev/null +++ b/Tests/DistributedActorsDocumentationTests/ProcessIsolatedDocExamples.swift @@ -0,0 +1,68 @@ +//===----------------------------------------------------------------------===// +// +// This source file is part of the Swift Distributed Actors open source project +// +// Copyright (c) 2018-2019 Apple Inc. and the Swift Distributed Actors project authors +// Licensed under Apache License v2.0 +// +// See LICENSE.txt for license information +// See CONTRIBUTORS.md for the list of Swift Distributed Actors project authors +// +// SPDX-License-Identifier: Apache-2.0 +// +//===----------------------------------------------------------------------===// + +// tag::imports[] + +import DistributedActors + +// end::imports[] + +private struct WorkRequest {} + +private struct Requests {} + +class ProcessIsolatedDocExamples { + func x() throws { + // tag::spawn_in_domain[] + let isolated = ProcessIsolated { boot in // <1> + + // optionally configure nodes by changing the provided settings + boot.settings.defaultLogLevel = .info + + // always create the actor system based on the provided boot settings, customized if needed + return ActorSystem(settings: boot.settings) + } + + // ~~~ The following code will execute on any process ~~~ // <2> + + // ... + + // executes only on .master process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + try isolated.run(on: .master) { // <3> + // spawn a servant process + isolated.spawnServantProcess( // <4> + supervision: .respawn( // <5> + atMost: 5, within: nil, + backoff: Backoff.exponential(initialInterval: .milliseconds(100), multiplier: 1.5, randomFactor: 0) + ) + ) + + // spawn the an actor on the master node <6> + try isolated.system.spawn("bruce", of: WorkRequest.self, .receiveMessage { _ in + // do something with the `work` + .same + }) + } + // end of executes only on .master process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + // executes only on .servant process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + try isolated.run(on: .servant) { // <7> + try isolated.system.spawn("alfred", of: Requests.self, .ignore) + } + // end of: executes only on .servant process ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + isolated.blockAndSuperviseServants() // <8> + // end::spawn_in_domain[] + } +} diff --git a/Tests/DistributedActorsDocumentationTests/SupervisionDocExamples.swift b/Tests/DistributedActorsDocumentationTests/SupervisionDocExamples.swift index 284221cba..8cff75153 100644 --- a/Tests/DistributedActorsDocumentationTests/SupervisionDocExamples.swift +++ b/Tests/DistributedActorsDocumentationTests/SupervisionDocExamples.swift @@ -27,7 +27,7 @@ class SupervisionDocExamples { // tag::supervise_props[] let props = Props() // <1> - .addingSupervision(strategy: .restart(atMost: 2, within: .seconds(1))) // <2> + .supervision(strategy: .restart(atMost: 2, within: .seconds(1))) // <2> // potentially more props configuration here ... let greeterRef = try context.spawn( @@ -45,7 +45,7 @@ class SupervisionDocExamples { // tag::supervise_inline[] let greeterRef = try context.spawn("greeter", - props: .addingSupervision(strategy: .restart(atMost: 2, within: .seconds(1))), // <1> + props: .supervision(strategy: .restart(atMost: 2, within: .seconds(1))), // <1> greeterBehavior) // end::supervise_inline[] _ = greeterRef @@ -75,7 +75,7 @@ class SupervisionDocExamples { let greeterRef: ActorRef = try system.spawn( "greeter", - props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(1))), + props: .supervision(strategy: .restart(atMost: 5, within: .seconds(1))), greeterBehavior(friends: friends) ) @@ -112,7 +112,7 @@ class SupervisionDocExamples { ] let greeterRef = try system.spawn("favFruit", - props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(1))), + props: .supervision(strategy: .restart(atMost: 5, within: .seconds(1))), favouriteFruitBehavior(whoLikesWhat)) greeterRef.tell("Alice") // ok! @@ -142,16 +142,16 @@ class SupervisionDocExamples { let thrower = try system.spawn( "thrower", props: Props() - .addingSupervision(strategy: .restart(atMost: 10, within: .seconds(5)), forErrorType: CatchThisError.self), // <2> - // .addSupervision(strategy: .stop, forAll: .failures) // (implicitly appended always) // <3> + .supervision(strategy: .restart(atMost: 10, within: nil), forErrorType: CatchThisError.self), // <2> + // .supervision(strategy: .stop, forAll: .failures) // (implicitly appended always) // <3> throwerBehavior ) - // Starting... + // Logs: [info] Starting... thrower.tell(CatchThisError()) // will crash and restart - // Starting... + // Logs: [info] Starting... thrower.tell(CatchThisError()) // again - // Starting... + // Logs: [info] Starting... thrower.tell(NotIntendedToBeCaught()) // crashes the actor for good // further messages sent to it will end up in `system.deadLetters` diff --git a/Tests/DistributedActorsTestKitTests/ActorTestProbeTests.swift b/Tests/DistributedActorsTestKitTests/ActorTestProbeTests.swift index 2daf1c2cf..e9d88b558 100644 --- a/Tests/DistributedActorsTestKitTests/ActorTestProbeTests.swift +++ b/Tests/DistributedActorsTestKitTests/ActorTestProbeTests.swift @@ -36,7 +36,7 @@ class ActorTestProbeTests: XCTestCase { #endif _ = "Skipping test \(#function), can't test the 'test assertions' being emitted; To see it crash run with `-D SACT_TESTS_CRASH`" - let probe = self.testKit.spawnTestProbe(name: "p1", expecting: String.self) + let probe = self.testKit.spawnTestProbe("p1", expecting: String.self) try probe.expectMessage("awaiting-forever") } @@ -48,7 +48,7 @@ class ActorTestProbeTests: XCTestCase { #endif _ = "Skipping test \(#function), can't test the 'test assertions' being emitted; To see it crash run with `-D SACT_TESTS_CRASH`" - let probe = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) + let probe = self.testKit.spawnTestProbe("p2", expecting: String.self) probe.tell("one") @@ -56,7 +56,7 @@ class ActorTestProbeTests: XCTestCase { } func test_maybeExpectMessage_shouldReturnTheReceivedMessage() throws { - let probe = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) + let probe = self.testKit.spawnTestProbe("p2", expecting: String.self) probe.tell("one") @@ -64,7 +64,7 @@ class ActorTestProbeTests: XCTestCase { } func test_maybeExpectMessage_shouldReturnNilIfTimeoutExceeded() throws { - let probe = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) + let probe = self.testKit.spawnTestProbe("p2", expecting: String.self) probe.tell("one") @@ -72,7 +72,7 @@ class ActorTestProbeTests: XCTestCase { } func test_expectNoMessage() throws { - let p = self.testKit.spawnTestProbe(name: "p3", expecting: String.self) + let p = self.testKit.spawnTestProbe("p3", expecting: String.self) try p.expectNoMessage(for: .milliseconds(100)) p.stop() diff --git a/Tests/DistributedActorsTests/ActorIsolationFailureHandlingTests.swift b/Tests/DistributedActorsTests/ActorIsolationFailureHandlingTests.swift index 37bf36845..8acfd0e03 100644 --- a/Tests/DistributedActorsTests/ActorIsolationFailureHandlingTests.swift +++ b/Tests/DistributedActorsTests/ActorIsolationFailureHandlingTests.swift @@ -77,8 +77,8 @@ final class ActorIsolationFailureHandlingTests: XCTestCase { } func test_worker_crashOnlyWorkerOnPlainErrorThrow() throws { - let pm: ActorTestProbe = self.testKit.spawnTestProbe(name: "testProbe-master-1") - let pw: ActorTestProbe = self.testKit.spawnTestProbe(name: "testProbeForWorker-1") + let pm: ActorTestProbe = self.testKit.spawnTestProbe("testProbe-master-1") + let pw: ActorTestProbe = self.testKit.spawnTestProbe("testProbeForWorker-1") let healthyMaster: ActorRef = try system.spawn("healthyMaster", self.healthyMasterBehavior(pm: pm.ref, pw: pw.ref)) diff --git a/Tests/DistributedActorsTests/ActorLifecycleTests.swift b/Tests/DistributedActorsTests/ActorLifecycleTests.swift index f7a77af06..d8c3d95ff 100644 --- a/Tests/DistributedActorsTests/ActorLifecycleTests.swift +++ b/Tests/DistributedActorsTests/ActorLifecycleTests.swift @@ -95,7 +95,7 @@ class ActorLifecycleTests: XCTestCase { // MARK: Stopping actors func test_stopping_shouldDeinitTheBehavior() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "p1") + let p: ActorTestProbe = self.testKit.spawnTestProbe("p1") let chattyAboutLifecycle = try system.spawn("deinitLifecycleActor", .class { LifecycleDeinitClassBehavior(p.ref) }) diff --git a/Tests/DistributedActorsTests/ActorLoggingTests.swift b/Tests/DistributedActorsTests/ActorLoggingTests.swift index 9241bce33..230846f19 100644 --- a/Tests/DistributedActorsTests/ActorLoggingTests.swift +++ b/Tests/DistributedActorsTests/ActorLoggingTests.swift @@ -46,8 +46,8 @@ class ActorLoggingTests: XCTestCase { } func test_actorLogger_shouldIncludeActorPath() throws { - let p = self.testKit.spawnTestProbe(name: "p", expecting: String.self) - let r = self.testKit.spawnTestProbe(name: "r", expecting: Rendered.self) + let p = self.testKit.spawnTestProbe("p", expecting: String.self) + let r = self.testKit.spawnTestProbe("r", expecting: Rendered.self) let ref: ActorRef = try system.spawn("myName", .setup { context in // ~~~~~~~ (imagine as) set by swift-distributed-actors library internally ~~~~~~~~~~ @@ -71,8 +71,8 @@ class ActorLoggingTests: XCTestCase { } func test_actorLogger_shouldNotRenderLazyMetadataIfLogIsUnderDefinedLogLevel() throws { - let p = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) - let r = self.testKit.spawnTestProbe(name: "r2", expecting: Rendered.self) + let p = self.testKit.spawnTestProbe("p2", expecting: String.self) + let r = self.testKit.spawnTestProbe("r2", expecting: Rendered.self) let ref: ActorRef = try system.spawn("myName", .setup { context in // ~~~~~~~ (imagine as) set by swift-distributed-actors library internally ~~~~~~~~~~ @@ -97,8 +97,8 @@ class ActorLoggingTests: XCTestCase { } func test_actorLogger_shouldNotRenderALazyValueIfWeOverwriteItUsingLocalMetadata() throws { - let p = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) - let r = self.testKit.spawnTestProbe(name: "r2", expecting: Rendered.self) + let p = self.testKit.spawnTestProbe("p2", expecting: String.self) + let r = self.testKit.spawnTestProbe("r2", expecting: Rendered.self) let ref: ActorRef = try system.spawn("myName", .setup { context in // ~~~~~~~ (imagine as) set by swift-distributed-actors library internally ~~~~~~~~~~ diff --git a/Tests/DistributedActorsTests/ActorRefAdapterTests.swift b/Tests/DistributedActorsTests/ActorRefAdapterTests.swift index cb7e5ac43..33feb2551 100644 --- a/Tests/DistributedActorsTests/ActorRefAdapterTests.swift +++ b/Tests/DistributedActorsTests/ActorRefAdapterTests.swift @@ -149,7 +149,7 @@ class ActorRefAdapterTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: .addingSupervision(strategy: strategy), behavior) + let ref = try system.spawn(.anonymous, props: .supervision(strategy: strategy), behavior) ref.tell(.createAdapter(replyTo: receiveRefProbe.ref)) let adaptedRef = try receiveRefProbe.expectMessage() @@ -192,7 +192,7 @@ class ActorRefAdapterTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: .addingSupervision(strategy: strategy), behavior) + let ref = try system.spawn(.anonymous, props: .supervision(strategy: strategy), behavior) ref.tell(.createAdapter(replyTo: receiveRefProbe.ref)) let adaptedRef = try receiveRefProbe.expectMessage() diff --git a/Tests/DistributedActorsTests/ActorSubReceiveTests.swift b/Tests/DistributedActorsTests/ActorSubReceiveTests.swift index 0e9d9f30b..ff2fb31be 100644 --- a/Tests/DistributedActorsTests/ActorSubReceiveTests.swift +++ b/Tests/DistributedActorsTests/ActorSubReceiveTests.swift @@ -157,7 +157,7 @@ class ActorSubReceiveTests: XCTestCase { return .unhandled } - _ = try system.spawn("test", props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) + _ = try system.spawn("test", props: .supervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) let subRef = try refProbe.expectMessage() diff --git a/Tests/DistributedActorsTests/BehaviorCanonicalizeTests.swift b/Tests/DistributedActorsTests/BehaviorCanonicalizeTests.swift index ca2bb011e..7cbf5cdaf 100644 --- a/Tests/DistributedActorsTests/BehaviorCanonicalizeTests.swift +++ b/Tests/DistributedActorsTests/BehaviorCanonicalizeTests.swift @@ -31,7 +31,7 @@ class BehaviorCanonicalizeTests: XCTestCase { } func test_canonicalize_nestedSetupBehaviors() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "canonicalizeProbe1") + let p: ActorTestProbe = self.testKit.spawnTestProbe("canonicalizeProbe1") let b: Behavior = .setup { _ in p.tell("outer-1") @@ -58,7 +58,7 @@ class BehaviorCanonicalizeTests: XCTestCase { } func test_canonicalize_doesSurviveDeeplyNestedSetups() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "canonicalizeProbe2") + let p: ActorTestProbe = self.testKit.spawnTestProbe("canonicalizeProbe2") func deepSetupRabbitHole(currentDepth depth: Int, stopAt limit: Int) -> Behavior { return .setup { _ in @@ -84,7 +84,7 @@ class BehaviorCanonicalizeTests: XCTestCase { } func test_canonicalize_unwrapInterceptBehaviors() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "canonicalizeProbe3") + let p: ActorTestProbe = self.testKit.spawnTestProbe("canonicalizeProbe3") let b: Behavior = .intercept(behavior: .setup { _ in p.tell("outer-1") @@ -170,7 +170,7 @@ class BehaviorCanonicalizeTests: XCTestCase { } func test_startBehavior_shouldThrowOnTooDeeplyNestedBehaviorSetups() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "startBehaviorProbe") + let p: ActorTestProbe = self.testKit.spawnTestProbe("startBehaviorProbe") /// Creates an infinitely nested setup behavior -- it is used to see that we detect this and abort executing eagerly func setupDaDoRunRunRunDaDoRunRun(depth: Int = 0) -> Behavior { diff --git a/Tests/DistributedActorsTests/BehaviorTests.swift b/Tests/DistributedActorsTests/BehaviorTests.swift index 6acdb4f6c..a58226229 100644 --- a/Tests/DistributedActorsTests/BehaviorTests.swift +++ b/Tests/DistributedActorsTests/BehaviorTests.swift @@ -41,7 +41,7 @@ class BehaviorTests: XCTestCase { } func test_setup_executesImmediatelyOnStartOfActor() throws { - let p = self.testKit.spawnTestProbe(name: "testActor-1", expecting: String.self) + let p = self.testKit.spawnTestProbe("testActor-1", expecting: String.self) let message = "EHLO" let _: ActorRef = try system.spawn(.anonymous, .setup { _ in @@ -53,7 +53,7 @@ class BehaviorTests: XCTestCase { } func test_single_actor_should_wakeUp_on_new_message_lockstep() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "testActor-2") + let p: ActorTestProbe = self.testKit.spawnTestProbe("testActor-2") var counter = 0 @@ -66,7 +66,7 @@ class BehaviorTests: XCTestCase { } func test_two_actors_should_wakeUp_on_new_message_lockstep() throws { - let p = self.testKit.spawnTestProbe(name: "testActor-2", expecting: String.self) + let p = self.testKit.spawnTestProbe("testActor-2", expecting: String.self) var counter = 0 @@ -85,7 +85,7 @@ class BehaviorTests: XCTestCase { } func test_receive_shouldReceiveManyMessagesInExpectedOrder() throws { - let p = self.testKit.spawnTestProbe(name: "testActor-3", expecting: Int.self) + let p = self.testKit.spawnTestProbe("testActor-3", expecting: Int.self) func countTillNThenDieBehavior(n: Int, currentlyAt at: Int = -1) -> Behavior { if at == n { @@ -133,7 +133,7 @@ class BehaviorTests: XCTestCase { // has to be ClassBehavior in test name, otherwise our generate_linux_tests is confused (and thinks this is an inner class) func test_ClassBehavior_receivesMessages() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "testActor-5") + let p: ActorTestProbe = self.testKit.spawnTestProbe("testActor-5") let ref: ActorRef = try system.spawn(.anonymous, .class { MyActorBehavior() }) @@ -175,7 +175,7 @@ class BehaviorTests: XCTestCase { // has to be ClassBehavior in test name, otherwise our generate_linux_tests is confused (and thinks this is an inner class) func test_ClassBehavior_receivesSignals() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "probe-6a") + let p: ActorTestProbe = self.testKit.spawnTestProbe("probe-6a") let ref: ActorRef = try system.spawn(.anonymous, .class { MySignalActorBehavior(probe: p.ref) }) ref.tell("do it") @@ -199,9 +199,9 @@ class BehaviorTests: XCTestCase { } func test_ClassBehavior_executesInitOnStartSignal() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "probe-7a") + let p: ActorTestProbe = self.testKit.spawnTestProbe("probe-7a") let ref: ActorRef = try system.spawn(.anonymous, - props: .addingSupervision(strategy: .restart(atMost: 1, within: nil)), + props: .supervision(strategy: .restart(atMost: 1, within: nil)), .class { MyStartingBehavior(probe: p.ref) }) ref.tell("hello") @@ -212,7 +212,7 @@ class BehaviorTests: XCTestCase { } func test_receiveSpecificSignal_shouldReceiveAsExpected() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "probe-specificSignal-1") + let p: ActorTestProbe = self.testKit.spawnTestProbe("probe-specificSignal-1") let _: ActorRef = try system.spawn(.anonymous, .setup { context in let _: ActorRef = try context.spawnWatch(.anonymous, .stop) @@ -227,7 +227,7 @@ class BehaviorTests: XCTestCase { } func test_receiveSpecificSignal_shouldNotReceiveOtherSignals() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "probe-specificSignal-2") + let p: ActorTestProbe = self.testKit.spawnTestProbe("probe-specificSignal-2") let ref: ActorRef = try system.spawn(.anonymous, Behavior.receiveMessage { _ in .stop }.receiveSpecificSignal(Signals.PostStop.self) { _, postStop in diff --git a/Tests/DistributedActorsTests/Cluster/ClusterReceptionistClusteredTests.swift b/Tests/DistributedActorsTests/Cluster/ClusterReceptionistClusteredTests.swift index 308a5252a..0a40b0f93 100644 --- a/Tests/DistributedActorsTests/Cluster/ClusterReceptionistClusteredTests.swift +++ b/Tests/DistributedActorsTests/Cluster/ClusterReceptionistClusteredTests.swift @@ -95,9 +95,9 @@ class ClusterReceptionistTests: ClusteredNodesTestBase { $0.cluster.receptionistSyncInterval = .milliseconds(100) } - let registeredProbe = self.testKit(local).spawnTestProbe(name: "registeredProbe", expecting: Receptionist.Registered.self) - let localLookupProbe = self.testKit(local).spawnTestProbe(name: "localLookupProbe", expecting: Receptionist.Listing.self) - let remoteLookupProbe = self.testKit(remote).spawnTestProbe(name: "remoteLookupProbe", expecting: Receptionist.Listing.self) + let registeredProbe = self.testKit(local).spawnTestProbe("registeredProbe", expecting: Receptionist.Registered.self) + let localLookupProbe = self.testKit(local).spawnTestProbe("localLookupProbe", expecting: Receptionist.Listing.self) + let remoteLookupProbe = self.testKit(remote).spawnTestProbe("remoteLookupProbe", expecting: Receptionist.Listing.self) let behavior: Behavior = .receiveMessage { _ in .same diff --git a/Tests/DistributedActorsTests/Cluster/ClusteredNodesTestBase.swift b/Tests/DistributedActorsTests/Cluster/ClusteredNodesTestBase.swift index c9990f8e0..47e9576d8 100644 --- a/Tests/DistributedActorsTests/Cluster/ClusteredNodesTestBase.swift +++ b/Tests/DistributedActorsTests/Cluster/ClusteredNodesTestBase.swift @@ -165,7 +165,7 @@ extension ClusteredNodesTestBase { let testKit = self.testKit(system) - let probe = testKit.spawnTestProbe(name: "probe-assertAssociated", expecting: Set.self, file: file, line: line) + let probe = testKit.spawnTestProbe("probe-assertAssociated", expecting: Set.self, file: file, line: line) defer { probe.stop() } try testKit.eventually(within: timeout ?? .seconds(5), file: file, line: line, column: column) { @@ -205,7 +205,7 @@ extension ClusteredNodesTestBase { verbose: Bool = false) throws { let testKit: ActorTestKit = self.testKit(system) - let probe = testKit.spawnTestProbe(name: .prefixed(with: "assertNotAssociated-probe"), expecting: Set.self) + let probe = testKit.spawnTestProbe(.prefixed(with: "assertNotAssociated-probe"), expecting: Set.self) defer { probe.stop() } try testKit.assertHolds(for: timeout ?? .seconds(1)) { system.cluster._shell.tell(.query(.associatedNodes(probe.ref))) diff --git a/Tests/DistributedActorsTests/Cluster/RemoteMessagingClusteredTests.swift b/Tests/DistributedActorsTests/Cluster/RemoteMessagingClusteredTests.swift index c4f46eec3..d72a214f0 100644 --- a/Tests/DistributedActorsTests/Cluster/RemoteMessagingClusteredTests.swift +++ b/Tests/DistributedActorsTests/Cluster/RemoteMessagingClusteredTests.swift @@ -156,7 +156,7 @@ class RemoteMessagingTests: ClusteredNodesTestBase { $0.serialization.registerCodable(for: EchoTestMessage.self, underId: 1001) } - let probe = self.testKit(local).spawnTestProbe(name: "X", expecting: String.self) + let probe = self.testKit(local).spawnTestProbe("X", expecting: String.self) let localRef: ActorRef = try local.spawn("localRef", .receiveMessage { message in probe.tell("response:\(message)") @@ -183,7 +183,7 @@ class RemoteMessagingTests: ClusteredNodesTestBase { $0.serialization.registerCodable(for: EchoTestMessage.self, underId: 1001) } - let probe = self.testKit(local).spawnTestProbe(name: "X", expecting: String.self) + let probe = self.testKit(local).spawnTestProbe("X", expecting: String.self) let refOnRemoteSystem: ActorRef = try remote.spawn("remoteAcquaintance", .receiveMessage { message in message.respondTo.tell("echo:\(message.string)") @@ -215,7 +215,7 @@ class RemoteMessagingTests: ClusteredNodesTestBase { $0.serialization.registerCodable(for: EchoTestMessage.self, underId: 1001) } - let probe = self.testKit(local).spawnTestProbe(name: "X", expecting: String.self) + let probe = self.testKit(local).spawnTestProbe("X", expecting: String.self) let refOnRemoteSystem: ActorRef = try remote.spawn("remoteAcquaintance", .receiveMessage { message in message.respondTo.tell("echo:\(message.string)") diff --git a/Tests/DistributedActorsTests/Cluster/SWIM/SWIMInstanceTests.swift b/Tests/DistributedActorsTests/Cluster/SWIM/SWIMInstanceTests.swift index f08afa387..cb311d706 100644 --- a/Tests/DistributedActorsTests/Cluster/SWIM/SWIMInstanceTests.swift +++ b/Tests/DistributedActorsTests/Cluster/SWIM/SWIMInstanceTests.swift @@ -169,9 +169,9 @@ final class SWIMInstanceTests: XCTestCase { func test_onPingRequestResponse_ignoresTooOldRefutations() { let swim = SWIM.Instance(.default) - let p1 = self.testKit.spawnTestProbe(name: "p1", expecting: SWIM.Message.self).ref - let p2 = self.testKit.spawnTestProbe(name: "p2", expecting: SWIM.Message.self).ref - let p3 = self.testKit.spawnTestProbe(name: "p3", expecting: SWIM.Message.self).ref + let p1 = self.testKit.spawnTestProbe("p1", expecting: SWIM.Message.self).ref + let p2 = self.testKit.spawnTestProbe("p2", expecting: SWIM.Message.self).ref + let p3 = self.testKit.spawnTestProbe("p3", expecting: SWIM.Message.self).ref // p3 is suspect already... swim.addMyself(p1) diff --git a/Tests/DistributedActorsTests/Cluster/SWIM/SWIMShellTests.swift b/Tests/DistributedActorsTests/Cluster/SWIM/SWIMShellTests.swift index cfcb9de13..12b1c0dd5 100644 --- a/Tests/DistributedActorsTests/Cluster/SWIM/SWIMShellTests.swift +++ b/Tests/DistributedActorsTests/Cluster/SWIM/SWIMShellTests.swift @@ -252,7 +252,7 @@ final class SWIMShellTests: ClusteredNodesTestBase { let p = self.testKit(remote).spawnTestProbe(expecting: SWIM.Ack.self) let remoteProbeRef = local._resolveKnownRemote(p.ref, onRemoteSystem: remote) - let memberProbe = self.testKit(remote).spawnTestProbe(name: "RemoteSWIM", expecting: SWIM.Message.self) + let memberProbe = self.testKit(remote).spawnTestProbe("RemoteSWIM", expecting: SWIM.Message.self) let remoteMemberRef = local._resolveKnownRemote(memberProbe.ref, onRemoteSystem: remote) let swimRef = try local.spawn("SWIM", self.swimBehavior(members: [remoteMemberRef], clusterRef: self.localClusterProbe.ref)) diff --git a/Tests/DistributedActorsTests/DeathWatchTests+XCTest.swift b/Tests/DistributedActorsTests/DeathWatchTests+XCTest.swift index 46c56b0a5..1a0a55df3 100644 --- a/Tests/DistributedActorsTests/DeathWatchTests+XCTest.swift +++ b/Tests/DistributedActorsTests/DeathWatchTests+XCTest.swift @@ -29,7 +29,8 @@ extension DeathWatchTests { ("test_minimized_deathPact_shouldTriggerForWatchedActor", test_minimized_deathPact_shouldTriggerForWatchedActor), ("test_minimized_deathPact_shouldNotTriggerForActorThatWasWatchedButIsNotAnymoreWhenTerminatedArrives", test_minimized_deathPact_shouldNotTriggerForActorThatWasWatchedButIsNotAnymoreWhenTerminatedArrives), ("test_watch_anAlreadyStoppedActorRefShouldReplyWithTerminated", test_watch_anAlreadyStoppedActorRefShouldReplyWithTerminated), - ("test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeDies", test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeDies), + ("test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeStops", test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeStops), + ("test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeThrows", test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeThrows), ("test_sendingToStoppedRef_shouldNotCrash", test_sendingToStoppedRef_shouldNotCrash), ] } diff --git a/Tests/DistributedActorsTests/DeathWatchTests.swift b/Tests/DistributedActorsTests/DeathWatchTests.swift index 8b874e41a..1236024a8 100644 --- a/Tests/DistributedActorsTests/DeathWatchTests.swift +++ b/Tests/DistributedActorsTests/DeathWatchTests.swift @@ -65,9 +65,9 @@ class DeathWatchTests: XCTestCase { } func test_watch_fromMultipleActors_shouldTriggerTerminatedWhenWatchedActorStops() throws { - let p = self.testKit.spawnTestProbe(name: "p", expecting: String.self) - let p1 = self.testKit.spawnTestProbe(name: "p1", expecting: String.self) - let p2 = self.testKit.spawnTestProbe(name: "p2", expecting: String.self) + let p = self.testKit.spawnTestProbe("p", expecting: String.self) + let p1 = self.testKit.spawnTestProbe("p1", expecting: String.self) + let p2 = self.testKit.spawnTestProbe("p2", expecting: String.self) let stoppableRef: ActorRef = try system.spawn("stopMePlz1", self.stopOnAnyMessage(probe: p.ref)) @@ -90,12 +90,12 @@ class DeathWatchTests: XCTestCase { } func test_watch_fromMultipleActors_shouldNotifyOfTerminationOnlyCurrentWatchers() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "p") - let p1: ActorTestProbe = self.testKit.spawnTestProbe(name: "p1") - let p2: ActorTestProbe = self.testKit.spawnTestProbe(name: "p2") + let p: ActorTestProbe = self.testKit.spawnTestProbe("p") + let p1: ActorTestProbe = self.testKit.spawnTestProbe("p1") + let p2: ActorTestProbe = self.testKit.spawnTestProbe("p2") // p3 will not watch by itself, but serve as our observer for what our in-line defined watcher observes - let p3_partnerOfNotActuallyWatching: ActorTestProbe = self.testKit.spawnTestProbe(name: "p3-not-really") + let p3_partnerOfNotActuallyWatching: ActorTestProbe = self.testKit.spawnTestProbe("p3-not-really") let stoppableRef: ActorRef = try system.spawn("stopMePlz2", self.stopOnAnyMessage(probe: p.ref)) @@ -134,7 +134,7 @@ class DeathWatchTests: XCTestCase { } func test_minimized_deathPact_shouldTriggerForWatchedActor() throws { - let probe = self.testKit.spawnTestProbe(name: "pp", expecting: String.self) + let probe = self.testKit.spawnTestProbe("pp", expecting: String.self) let juliet = try system.spawn("juliet", Behavior.receiveMessage { _ in .same @@ -171,7 +171,7 @@ class DeathWatchTests: XCTestCase { // The .terminated message should also NOT be delivered to the .receiveSignal handler, it should be as if the watcher // never watched juliet to begin with. (This also is important so Swift Distributed Actors semantics are the same as what users would manually be able to to) - let probe = self.testKit.spawnTestProbe(name: "pp", expecting: String.self) + let probe = self.testKit.spawnTestProbe("pp", expecting: String.self) let juliet = try system.spawn("juliet", Behavior.receiveMessage { _ in .same @@ -213,7 +213,7 @@ class DeathWatchTests: XCTestCase { } func test_watch_anAlreadyStoppedActorRefShouldReplyWithTerminated() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "alreadyDeadWatcherProbe") + let p: ActorTestProbe = self.testKit.spawnTestProbe("alreadyDeadWatcherProbe") let alreadyDead: ActorRef = try system.spawn("alreadyDead", .stop) @@ -221,7 +221,7 @@ class DeathWatchTests: XCTestCase { try p.expectTerminated(alreadyDead) // even if a new actor comes in and performs the watch, it also should notice that `alreadyDead` is dead - let p2: ActorTestProbe = self.testKit.spawnTestProbe(name: "alreadyDeadWatcherProbe2") + let p2: ActorTestProbe = self.testKit.spawnTestProbe("alreadyDeadWatcherProbe2") p2.watch(alreadyDead) try p2.expectTerminated(alreadyDead) @@ -231,7 +231,7 @@ class DeathWatchTests: XCTestCase { // MARK: Death pact - func test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeDies() throws { + func test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeStops() throws { let romeo = try system.spawn("romeo", Behavior.receive { context, message in switch message { case .pleaseWatch(let juliet, let probe): @@ -244,11 +244,11 @@ class DeathWatchTests: XCTestCase { let juliet = try system.spawn("juliet", Behavior.receiveMessage { message in switch message { case .takePoison: - return .stop // "kill myself" // TODO: throw + return .stop // "stop myself" } }) - let p = self.testKit.spawnTestProbe(name: "p", expecting: Done.self) + let p = self.testKit.spawnTestProbe("p", expecting: Done.self) p.watch(juliet) p.watch(romeo) @@ -258,10 +258,42 @@ class DeathWatchTests: XCTestCase { juliet.tell(.takePoison) - try p.expectTerminated(juliet) // TODO: not actually guaranteed in the order here - try p.expectTerminated(romeo) // TODO: not actually guaranteed in the order here + try p.expectTerminatedInAnyOrder([juliet.asAddressable(), romeo.asAddressable()]) } + func test_deathPact_shouldMakeWatcherKillItselfWhenWatcheeThrows() throws { + let romeo = try system.spawn("romeo", Behavior.receive { context, message in + switch message { + case .pleaseWatch(let juliet, let probe): + context.watch(juliet) + probe.tell(.done) + return .same + } + } /* NOT handling signal on purpose, we are in a Death Pact */ ) + + let juliet = try system.spawn("juliet", Behavior.receiveMessage { message in + switch message { + case .takePoison: + throw TakePoisonError() // "stop myself" + } + }) + + let p = self.testKit.spawnTestProbe("p", expecting: Done.self) + + p.watch(juliet) + p.watch(romeo) + + romeo.tell(.pleaseWatch(juliet: juliet, probe: p.ref)) + try p.expectMessage(.done) + + juliet.tell(.takePoison) + + try p.expectTerminatedInAnyOrder([juliet.asAddressable(), romeo.asAddressable()]) + } + + struct TakePoisonError: Error {} + + // ==== ------------------------------------------------------------------------------------------------------------ // MARK: Watching dead letters ref // // FIXME: Make deadLetters a real thing, currently it is too hacky (i.e. this will crash): diff --git a/Tests/DistributedActorsTests/ParentChildActorTests.swift b/Tests/DistributedActorsTests/ParentChildActorTests.swift index 9a4ad4848..687fbdb94 100644 --- a/Tests/DistributedActorsTests/ParentChildActorTests.swift +++ b/Tests/DistributedActorsTests/ParentChildActorTests.swift @@ -281,9 +281,9 @@ class ParentChildActorTests: XCTestCase { } func test_spawnStopSpawn_shouldWorkWithSameChildName() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "p") - let p1: ActorTestProbe = self.testKit.spawnTestProbe(name: "p1") - let p2: ActorTestProbe = self.testKit.spawnTestProbe(name: "p2") + let p: ActorTestProbe = self.testKit.spawnTestProbe("p") + let p1: ActorTestProbe = self.testKit.spawnTestProbe("p1") + let p2: ActorTestProbe = self.testKit.spawnTestProbe("p2") let parent: ActorRef = try system.spawn(.anonymous, .receive { context, msg in switch msg { @@ -496,7 +496,7 @@ class ParentChildActorTests: XCTestCase { } func test_spawnStopSpawnManyTimesWithSameName_shouldProperlyTerminateAllChildren() throws { - let p: ActorTestProbe = self.testKit.spawnTestProbe(name: "p") + let p: ActorTestProbe = self.testKit.spawnTestProbe("p") let childCount = 100 let parent: ActorRef = try system.spawn(.anonymous, .receive { context, msg in diff --git a/Tests/DistributedActorsTests/Pattern/WorkerPoolTests.swift b/Tests/DistributedActorsTests/Pattern/WorkerPoolTests.swift index f8a9b2898..22b20511b 100644 --- a/Tests/DistributedActorsTests/Pattern/WorkerPoolTests.swift +++ b/Tests/DistributedActorsTests/Pattern/WorkerPoolTests.swift @@ -34,9 +34,9 @@ final class WorkerPoolTests: XCTestCase { func test_workerPool_registerNewlyStartedActors() throws { let workerKey = Receptionist.RegistrationKey(String.self, id: "request-workers") - let pA: ActorTestProbe = self.testKit.spawnTestProbe(name: "pA") - let pB: ActorTestProbe = self.testKit.spawnTestProbe(name: "pB") - let pC: ActorTestProbe = self.testKit.spawnTestProbe(name: "pC") + let pA: ActorTestProbe = self.testKit.spawnTestProbe("pA") + let pB: ActorTestProbe = self.testKit.spawnTestProbe("pB") + let pC: ActorTestProbe = self.testKit.spawnTestProbe("pC") func worker(p: ActorTestProbe) -> Behavior { return .setup { context in @@ -79,9 +79,9 @@ final class WorkerPoolTests: XCTestCase { func test_workerPool_dynamic_removeDeadActors() throws { let workerKey = Receptionist.RegistrationKey(String.self, id: "request-workers") - let pA: ActorTestProbe = self.testKit.spawnTestProbe(name: "pA") - let pB: ActorTestProbe = self.testKit.spawnTestProbe(name: "pB") - let pC: ActorTestProbe = self.testKit.spawnTestProbe(name: "pC") + let pA: ActorTestProbe = self.testKit.spawnTestProbe("pA") + let pB: ActorTestProbe = self.testKit.spawnTestProbe("pB") + let pC: ActorTestProbe = self.testKit.spawnTestProbe("pC") func worker(p: ActorTestProbe) -> Behavior { return .setup { context in @@ -146,9 +146,9 @@ final class WorkerPoolTests: XCTestCase { } func test_workerPool_ask() throws { - let pA: ActorTestProbe = self.testKit.spawnTestProbe(name: "pA") - let pB: ActorTestProbe = self.testKit.spawnTestProbe(name: "pB") - let pW: ActorTestProbe = self.testKit.spawnTestProbe(name: "pW") + let pA: ActorTestProbe = self.testKit.spawnTestProbe("pA") + let pB: ActorTestProbe = self.testKit.spawnTestProbe("pB") + let pW: ActorTestProbe = self.testKit.spawnTestProbe("pW") func worker(p: ActorTestProbe) -> Behavior { return .receive { context, work in @@ -187,10 +187,10 @@ final class WorkerPoolTests: XCTestCase { } func test_workerPool_static_removeDeadActors_terminateItselfWhenNoWorkers() throws { - let pA: ActorTestProbe = self.testKit.spawnTestProbe(name: "pA") - let pB: ActorTestProbe = self.testKit.spawnTestProbe(name: "pB") - let pC: ActorTestProbe = self.testKit.spawnTestProbe(name: "pC") - let pW: ActorTestProbe = self.testKit.spawnTestProbe(name: "pW") + let pA: ActorTestProbe = self.testKit.spawnTestProbe("pA") + let pB: ActorTestProbe = self.testKit.spawnTestProbe("pB") + let pC: ActorTestProbe = self.testKit.spawnTestProbe("pC") + let pW: ActorTestProbe = self.testKit.spawnTestProbe("pW") func worker(p: ActorTestProbe) -> Behavior { return .receive { context, work in diff --git a/Tests/DistributedActorsTests/SerializationTests.swift b/Tests/DistributedActorsTests/SerializationTests.swift index da2e87475..d2c0c7e4c 100644 --- a/Tests/DistributedActorsTests/SerializationTests.swift +++ b/Tests/DistributedActorsTests/SerializationTests.swift @@ -273,7 +273,7 @@ class SerializationTests: XCTestCase { } do { - let p = self.testKit.spawnTestProbe(name: "p1", expecting: String.self) + let p = self.testKit.spawnTestProbe("p1", expecting: String.self) let echo: ActorRef = try s2.spawn("echo", .receiveMessage { msg in p.ref.tell("echo:\(msg)") return .same diff --git a/Tests/DistributedActorsTests/SupervisionTests+XCTest.swift b/Tests/DistributedActorsTests/SupervisionTests+XCTest.swift index e26520f8c..a43e3e526 100644 --- a/Tests/DistributedActorsTests/SupervisionTests+XCTest.swift +++ b/Tests/DistributedActorsTests/SupervisionTests+XCTest.swift @@ -27,6 +27,9 @@ extension SupervisionTests { ("test_restartSupervised_throws_shouldRestart", test_restartSupervised_throws_shouldRestart), ("test_restartAtMostWithin_throws_shouldRestartNoMoreThanAllowedWithinPeriod", test_restartAtMostWithin_throws_shouldRestartNoMoreThanAllowedWithinPeriod), ("test_restartSupervised_throws_shouldRestart_andCreateNewInstanceOfClassBehavior", test_restartSupervised_throws_shouldRestart_andCreateNewInstanceOfClassBehavior), + ("test_escalateSupervised_throws_shouldKeepEscalatingThrough_watchingParents", test_escalateSupervised_throws_shouldKeepEscalatingThrough_watchingParents), + ("test_escalateSupervised_throws_shouldKeepEscalatingThrough_nonWatchingParents", test_escalateSupervised_throws_shouldKeepEscalatingThrough_nonWatchingParents), + ("test_escalateSupervised_throws_shouldKeepEscalatingUntilNonEscalatingParent", test_escalateSupervised_throws_shouldKeepEscalatingUntilNonEscalatingParent), ("test_restart_throws_shouldHandleFailureWhenInterpretingStart", test_restart_throws_shouldHandleFailureWhenInterpretingStart), ("test_restartSupervised_throws_shouldRestartWithConstantBackoff", test_restartSupervised_throws_shouldRestartWithConstantBackoff), ("test_restartSupervised_throws_shouldRestartWithExponentialBackoff", test_restartSupervised_throws_shouldRestartWithExponentialBackoff), diff --git a/Tests/DistributedActorsTests/SupervisionTests.swift b/Tests/DistributedActorsTests/SupervisionTests.swift index 01a276ed2..69233d467 100644 --- a/Tests/DistributedActorsTests/SupervisionTests.swift +++ b/Tests/DistributedActorsTests/SupervisionTests.swift @@ -92,23 +92,23 @@ class SupervisionTests: XCTestCase { _ = try self.system.spawn("example", behavior) _ = try self.system.spawn("example", props: Props(), behavior) _ = try self.system.spawn("example", props: .dispatcher(.pinnedThread), behavior) - _ = try self.system.spawn("example", props: Props().dispatcher(.pinnedThread).addingSupervision(strategy: .stop), behavior) - _ = try self.system.spawn("example", props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(1))), behavior) - _ = try self.system.spawn("example", props: .addingSupervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)), behavior) + _ = try self.system.spawn("example", props: Props().dispatcher(.pinnedThread).supervision(strategy: .stop), behavior) + _ = try self.system.spawn("example", props: .supervision(strategy: .restart(atMost: 5, within: .seconds(1))), behavior) + _ = try self.system.spawn("example", props: .supervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)), behavior) // chaining _ = try self.system.spawn("example", props: Props() - .addingSupervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)) + .supervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)) .dispatcher(.pinnedThread) .mailbox(.default(capacity: 122, onOverflow: .crash)), behavior) _ = try self.system.spawn("example", props: Props() - .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(1)), forErrorType: EasilyCatchable.self) - .addingSupervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)) - .addingSupervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)), + .supervision(strategy: .restart(atMost: 5, within: .seconds(1)), forErrorType: EasilyCatchable.self) + .supervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)) + .supervision(strategy: .restart(atMost: 5, within: .effectivelyInfinite)), behavior) } } @@ -124,7 +124,7 @@ class SupervisionTests: XCTestCase { let strategy: SupervisionStrategy = .stop let behavior = self.faulty(probe: p.ref) let _: ActorRef = try context.spawn("\(runName)-erroring-1", - props: .addingSupervision(strategy: strategy), + props: .supervision(strategy: strategy), behavior) return .same } @@ -153,7 +153,7 @@ class SupervisionTests: XCTestCase { let parentBehavior: Behavior = .setup { context in let _: ActorRef = try context.spawn( "\(runName)-erroring-2", - props: Props().addingSupervision(strategy: .restart(atMost: 2, within: .seconds(1))), + props: Props().supervision(strategy: .restart(atMost: 2, within: .seconds(1))), self.faulty(probe: p.ref) ) @@ -201,7 +201,7 @@ class SupervisionTests: XCTestCase { let parentBehavior: Behavior = .setup { context in let _: ActorRef = try context.spawn("\(runName)-failing-2", - props: Props().addingSupervision(strategy: .restart(atMost: 3, within: .seconds(1), backoff: backoff)), + props: Props().supervision(strategy: .restart(atMost: 3, within: .seconds(1), backoff: backoff)), self.faulty(probe: p.ref)) return .same @@ -263,7 +263,7 @@ class SupervisionTests: XCTestCase { let parentBehavior: Behavior = .setup { context in let _: ActorRef = try context.spawn( "\(runName)-exponentialBackingOff", - props: Props().addingSupervision(strategy: .restart(atMost: 10, within: nil, backoff: backoff)), + props: Props().supervision(strategy: .restart(atMost: 10, within: nil, backoff: backoff)), self.faulty(probe: p.ref) ) @@ -317,7 +317,7 @@ class SupervisionTests: XCTestCase { let parentBehavior: Behavior = .setup { context in let _: ActorRef = try context.spawn( "\(runName)-erroring-within-2", - props: .addingSupervision(strategy: .restart(atMost: 2, within: failurePeriod)), + props: .supervision(strategy: .restart(atMost: 2, within: failurePeriod)), self.faulty(probe: p.ref) ) return .same @@ -384,7 +384,7 @@ class SupervisionTests: XCTestCase { } } - let ref: ActorRef = try system.spawn("fail-in-start-1", props: .addingSupervision(strategy: strategy), behavior) + let ref: ActorRef = try system.spawn("fail-in-start-1", props: .supervision(strategy: strategy), behavior) try probe.expectMessage("failing") try probe.expectMessage("starting") @@ -419,7 +419,7 @@ class SupervisionTests: XCTestCase { } } - let ref: ActorRef = try system.spawn("fail-in-start-2", props: .addingSupervision(strategy: strategy), behavior) + let ref: ActorRef = try system.spawn("fail-in-start-2", props: .supervision(strategy: strategy), behavior) try probe.expectMessage("starting") ref.tell("test") @@ -444,7 +444,7 @@ class SupervisionTests: XCTestCase { } } - let ref: ActorRef = try system.spawn("fail-in-start-3", props: .addingSupervision(strategy: strategy), behavior) + let ref: ActorRef = try system.spawn("fail-in-start-3", props: .supervision(strategy: strategy), behavior) probe.watch(ref) for _ in 1 ... 5 { try probe.expectMessage("starting") @@ -480,7 +480,7 @@ class SupervisionTests: XCTestCase { let p = self.testKit.spawnTestProbe(expecting: String.self) let ref = try system.spawn( "class-behavior", - props: .addingSupervision(strategy: .restart(atMost: 2, within: nil)), + props: .supervision(strategy: .restart(atMost: 2, within: nil)), .class { MyCrashingClassBehavior(p.ref) } ) @@ -514,6 +514,181 @@ class SupervisionTests: XCTestCase { } } + // ==== ---------------------------------------------------------------------------------------------------------------- + // MARK: Escalating supervision + + func test_escalateSupervised_throws_shouldKeepEscalatingThrough_watchingParents() throws { + let pt = self.testKit.spawnTestProbe("pt", expecting: ActorRef.self) + let pm = self.testKit.spawnTestProbe("pm", expecting: ActorRef.self) + let pab = self.testKit.spawnTestProbe("pab", expecting: ActorRef.self) + let pb = self.testKit.spawnTestProbe("pb", expecting: ActorRef.self) + let pp = self.testKit.spawnTestProbe("pp", expecting: String.self) + + _ = try self.system.spawn("top", of: String.self, .setup { c in + pt.tell(c.myself) + + _ = try c.spawn("middle", of: String.self, props: .supervision(strategy: .escalate), .setup { cc in + pm.tell(cc.myself) + + // you can also just watch, this way the failure will be both in case of stop or crash; failure will be a Death Pact rather than indicating an escalation + _ = try cc.spawnWatch("almostBottom", of: String.self, .setup { ccc in + pab.tell(ccc.myself) + + _ = try ccc.spawnWatch("bottom", of: String.self, props: .supervision(strategy: .escalate), .setup { cccc in + pb.tell(cccc.myself) + return .receiveMessage { message in + throw Boom(message) + } + }) + + return .ignore + }) + + return .ignore + }) + + return Behavior.receiveSpecificSignal(Signals.ChildTerminated.self) { context, terminated in + pp.tell("Prevented escalation to top level in \(context.myself.path), terminated: \(terminated)") + + return .same // stop the failure from reaching the guardian and terminating the system + } + }) + + let top = try pt.expectMessage() + pt.watch(top) + let middle = try pm.expectMessage() + pm.watch(middle) + let almostBottom = try pab.expectMessage() + pab.watch(almostBottom) + let bottom = try pb.expectMessage() + pb.watch(bottom) + + bottom.tell("Boom!") + + let msg = try pp.expectMessage() + msg.shouldContain("Prevented escalation to top level in /user/top") + + // Death Parade: + try pb.expectTerminated(bottom) // Boom! + try pab.expectTerminated(almostBottom) // Boom! + try pm.expectTerminated(middle) // Boom! + + // top should not terminate since it handled the thing + try pt.expectNoTerminationSignal(for: .milliseconds(200)) + } + + func test_escalateSupervised_throws_shouldKeepEscalatingThrough_nonWatchingParents() throws { + let pt = self.testKit.spawnTestProbe("pt", expecting: ActorRef.self) + let pm = self.testKit.spawnTestProbe("pm", expecting: ActorRef.self) + let pab = self.testKit.spawnTestProbe("pab", expecting: ActorRef.self) + let pb = self.testKit.spawnTestProbe("pb", expecting: ActorRef.self) + let pp = self.testKit.spawnTestProbe("pp", expecting: String.self) + + _ = try self.system.spawn("top", of: String.self, .setup { c in + pt.tell(c.myself) + + _ = try c.spawn("middle", of: String.self, props: .supervision(strategy: .escalate), .setup { cc in + pm.tell(cc.myself) + + _ = try cc.spawn("almostBottom", of: String.self, props: .supervision(strategy: .escalate), .setup { ccc in + pab.tell(ccc.myself) + + _ = try ccc.spawn("bottom", of: String.self, props: .supervision(strategy: .escalate), .setup { cccc in + pb.tell(cccc.myself) + return .receiveMessage { message in + throw Boom(message) + } + }) + + return .ignore + }) + + return .ignore + }) + + return Behavior.receiveSpecificSignal(Signals.ChildTerminated.self) { context, terminated in + pp.tell("Prevented escalation to top level in \(context.myself.path), terminated: \(terminated)") + + return .same // stop the failure from reaching the guardian and terminating the system + } + }) + + let top = try pt.expectMessage() + pt.watch(top) + let middle = try pm.expectMessage() + pm.watch(middle) + let almostBottom = try pab.expectMessage() + pab.watch(almostBottom) + let bottom = try pb.expectMessage() + pb.watch(bottom) + + bottom.tell("Boom!") + + let msg = try pp.expectMessage() + msg.shouldContain("Prevented escalation to top level in /user/top") + + // Death Parade: + try pb.expectTerminated(bottom) // Boom! + try pab.expectTerminated(almostBottom) // Boom! + try pm.expectTerminated(middle) // Boom! + + // top should not terminate since it handled the thing + try pt.expectNoTerminationSignal(for: .milliseconds(200)) + } + + func test_escalateSupervised_throws_shouldKeepEscalatingUntilNonEscalatingParent() throws { + let pt = self.testKit.spawnTestProbe("pt", expecting: ActorRef.self) + let pm = self.testKit.spawnTestProbe("pm", expecting: ActorRef.self) + let pab = self.testKit.spawnTestProbe("pab", expecting: ActorRef.self) + let pb = self.testKit.spawnTestProbe("pb", expecting: ActorRef.self) + let pp = self.testKit.spawnTestProbe("pp", expecting: String.self) + + _ = try self.system.spawn("top", of: String.self, .setup { c in + pt.tell(c.myself) + + _ = try c.spawn("middle", of: String.self, .setup { cc in + pm.tell(cc.myself) + + // does not watch or escalate child failures, this means that this is our "failure isolator"; failures will be stopped at this actor (!) + _ = try cc.spawn("almostBottom", of: String.self, .setup { ccc in + pab.tell(ccc.myself) + + _ = try ccc.spawn("bottom", of: String.self, props: .supervision(strategy: .escalate), .setup { cccc in + pb.tell(cccc.myself) + return .receiveMessage { message in + throw Boom(message) + } + }) + + return .ignore + }) + + return .ignore + }) + + return .ignore + }) + + let top = try pt.expectMessage() + pt.watch(top) + let middle = try pm.expectMessage() + pm.watch(middle) + let almostBottom = try pab.expectMessage() + pab.watch(almostBottom) + let bottom = try pb.expectMessage() + pb.watch(bottom) + + bottom.tell("Boom!") + + // Death Parade: + try pb.expectTerminated(bottom) // Boom! + try pab.expectTerminated(almostBottom) // Boom! + + // the almost bottom has isolated the fault; it does not leak more upwards the tree + try pm.expectNoTerminationSignal(for: .milliseconds(100)) + try pt.expectNoTerminationSignal(for: .milliseconds(100)) + } + // ==== ------------------------------------------------------------------------------------------------------------ // MARK: Restarting supervision with Backoff @@ -549,8 +724,8 @@ class SupervisionTests: XCTestCase { let faultyWorker = try system.spawn("compositeFailures-1", props: Props() - .addingSupervision(strategy: .restart(atMost: 1, within: nil), forErrorType: CatchMe.self) - .addingSupervision(strategy: .restart(atMost: 1, within: nil), forErrorType: EasilyCatchable.self), + .supervision(strategy: .restart(atMost: 1, within: nil), forErrorType: CatchMe.self) + .supervision(strategy: .restart(atMost: 1, within: nil), forErrorType: EasilyCatchable.self), self.faulty(probe: probe.ref)) probe.watch(faultyWorker) @@ -593,7 +768,7 @@ class SupervisionTests: XCTestCase { let parentRef: ActorRef = try system.spawn( "parent", - props: .addingSupervision(strategy: .restart(atMost: 2, within: nil)), + props: .supervision(strategy: .restart(atMost: 2, within: nil)), parentBehavior ) parentProbe.watch(parentRef) @@ -652,7 +827,7 @@ class SupervisionTests: XCTestCase { let parentBehavior: Behavior = .setup { context in let _: ActorRef = try context.spawn( "bad-decision-erroring-2", - props: .addingSupervision(strategy: .restart(atMost: 3, within: .seconds(5))), + props: .supervision(strategy: .restart(atMost: 3, within: .seconds(5))), stackOverflowFaulty ) return .same @@ -702,7 +877,7 @@ class SupervisionTests: XCTestCase { let supervisedThrower: ActorRef = try system.spawn( "thrower-1", - props: .addingSupervision(strategy: .restart(atMost: 10, within: nil), forErrorType: EasilyCatchable.self), + props: .supervision(strategy: .restart(atMost: 10, within: nil), forErrorType: EasilyCatchable.self), self.throwerBehavior(probe: p) ) @@ -724,7 +899,7 @@ class SupervisionTests: XCTestCase { let supervisedThrower: ActorRef = try system.spawn( "thrower-2", - props: .addingSupervision(strategy: .restart(atMost: 100, within: nil), forAll: .errors), + props: .supervision(strategy: .restart(atMost: 100, within: nil), forAll: .errors), self.throwerBehavior(probe: p) ) @@ -754,7 +929,7 @@ class SupervisionTests: XCTestCase { return .same } - let ref = try system.spawn(.anonymous, props: .addingSupervision(strategy: .restart(atMost: 1, within: .seconds(5))), behavior) + let ref = try system.spawn(.anonymous, props: .supervision(strategy: .restart(atMost: 1, within: .seconds(5))), behavior) p.watch(ref) ref.tell("test") @@ -789,7 +964,7 @@ class SupervisionTests: XCTestCase { return .same } - let ref = try system.spawn("fail-onside-pre-restart", props: .addingSupervision(strategy: .restart(atMost: 3, within: nil, backoff: backoff)), failOnBoom) + let ref = try system.spawn("fail-onside-pre-restart", props: .supervision(strategy: .restart(atMost: 3, within: nil, backoff: backoff)), failOnBoom) p.watch(ref) ref.tell("boom") @@ -821,7 +996,7 @@ class SupervisionTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) + let ref = try system.spawn(.anonymous, props: .supervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) p.watch(ref) ref.tell("test") @@ -852,7 +1027,7 @@ class SupervisionTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: .addingSupervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) + let ref = try system.spawn(.anonymous, props: .supervision(strategy: .restart(atMost: 5, within: .seconds(5))), behavior) p.watch(ref) try p.expectMessage("setup") @@ -893,7 +1068,7 @@ class SupervisionTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: Props.addingSupervision(strategy: .restart(atMost: 1, within: .seconds(1))), behavior) + let ref = try system.spawn(.anonymous, props: Props.supervision(strategy: .restart(atMost: 1, within: .seconds(1))), behavior) try p.expectMessage("starting") ref.tell("suspend") @@ -927,7 +1102,7 @@ class SupervisionTests: XCTestCase { } } - let ref = try system.spawn(.anonymous, props: Props.addingSupervision(strategy: .restart(atMost: 1, within: .seconds(1))), behavior) + let ref = try system.spawn(.anonymous, props: Props.supervision(strategy: .restart(atMost: 1, within: .seconds(1))), behavior) try p.expectMessage("starting") ref.tell("suspend")