Skip to content

Conversation

@jgallagher
Copy link
Contributor

Zone cleanup and saga reassignment both assume that expunged zones in the blueprint are no longer running, which is currently dependent on the earlier deploy_zones execution step completing successfully. Add an empty DeployZonesDone token to make this dependency explicit.

This is a small change with no runtime effect that will prevent us from closing #6999 before we address these steps.

Zone cleanup and saga reassignment both assume that expunged zones in
the blueprint are no longer running, which is currently dependent on the
earlier `deploy_zones` execution step completing successfully. Add an
empty `DeployZonesDone` token to make this dependency explicit.
@jgallagher jgallagher requested a review from smklein February 12, 2025 19:07
@jgallagher
Copy link
Contributor Author

Update: after chatting with @smklein, I moved the support bundle reassignment later and made it also dependent on the deploy zones step.

blueprint,
);

let deploy_zones_done = register_support_bundle_failure_step(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes #7529

Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

Are we going to need to make this determination on a per-sled basis in the future? I'm wondering if we can spit out a graphviz dot file somehow.

@jgallagher
Copy link
Contributor Author

Are we going to need to make this determination on a per-sled basis in the future? I'm wondering if we can spit out a graphviz dot file somehow.

I don't think so; I think the preference is to break these dependencies entirely. This will require execution being able to tell the difference between "expunged but might still be running" and "expunged and guaranteed to no longer be running", which itself requires the planner to consult inventory and annotate the blueprint when a thing is guaranteed to no longer be running. #7286 does this for physical disk decommissioning by adding an explicit state in the blueprint disk config. In a call earlier this week, we talked about an alternative: we could BlueprintZoneDisposition::Expunged to something like BlueprintZoneDisposition::Expunged { eligible_for_cleanup: bool }. The planner is responsible for filling that in with true once it confirms via inventory that some prior execution has succeeded sufficiently that we know the sled where this zone was running is no longer running it (and will never try to rerun it).

@jgallagher jgallagher merged commit faa5769 into main Feb 14, 2025
16 checks passed
@jgallagher jgallagher deleted the john/make-execution-dependency-explicit branch February 14, 2025 19:39
hawkw pushed a commit that referenced this pull request Feb 21, 2025
Zone cleanup and saga reassignment both assume that expunged zones in
the blueprint are no longer running, which is currently dependent on the
earlier `deploy_zones` execution step completing successfully. Add an
empty `DeployZonesDone` token to make this dependency explicit.

This is a small change with no runtime effect that will prevent us from
closing #6999 before we address these steps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants