Skip to content

blueprint-execution: Break dependency between PUT /omicron-zones success and subsequent cleanup steps #7527

@jgallagher

Description

@jgallagher

blueprint-execution has two cleanup steps that depend on the success of the earlier PUT /omicron-zones step:

  • zone cleanup
  • saga reassignment
  • failed support bundle cleanup

All of these steps assume that the zones they are cleaning up after are no longer running, but that's only true today because PUT /omicron-zones is synchronous (i.e., sled-agent only returns success if it has already stopped any zones that shouldn't be running) and because execution is stopped if the PUT /omicron-zones step fails. We definitely want to change the second of those (this is #6999), and longer term we probably want to change the first one too (converting sled-agent into more of a "accept and return the new config, then make it real via a reconciler loop in the background).

#7524 is a small PR that makes these dependencies explicit in code. We should break this dependency somehow:

  • The planner could confirm that a zone is gone and indicate it's ready for cleanup via some property in the blueprint (similar to the treatment disks got in Expunge and Decommission disks in planner #7286)
  • Is it possible for the executor to know this on its own? (I don't think so but maybe?)

This is a blocker for fixing #6999.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions