Skip to content

Conversation

@smklein
Copy link
Collaborator

@smklein smklein commented Oct 22, 2025

If a Nexus is expunged after collection is complete, don't fail the bundle. Leave it active.

Previously, the upgrade pathway (involving expunging and adding new Nexuses) would cause all support bundles to be marked "failed". For fully-collected bundles, this is not necessary, and fixed by this PR.

Fixes #9257

@smklein smklein force-pushed the bundle-expunge-better branch from df95650 to 82d8b14 Compare October 22, 2025 00:39
.await?;

// Find all bundles on nexuses that no longer exist.
// Find all collecting bundles on nexuses that no longer exist.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In-progress" bundles get marked failed if their "owning Nexus" dies.

Otherwise: there is nothing to mark failed, when a Nexus gets expunged.

opctx,
&pagparams,
self.nexus_id,
None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old: Each nexus queries their own set of bundles to see what should be destroyed.
New: All nexuses query any bundles in these states, and try to clean them up concurrently.

This does mean bundle deletion may happen concurrently, from multiple distinct Nexuses, since there is no real meaning of "ownership" after collection has completed.

I think this should be safe - the process of deleting a bundle involves:

  1. Delete from the sled where it's stored
  2. Delete from the database, updating the record

Both of which should be independently safe from concurrent Nexuses

@@ -0,0 +1,4 @@
CREATE INDEX IF NOT EXISTS lookup_bundle_by_state ON omicron.public.support_bundle (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're indexing by state, not by nexus ID, with this new PR

@smklein smklein marked this pull request as ready for review October 22, 2025 02:16
@smklein smklein marked this pull request as draft October 24, 2025 00:08
@smklein
Copy link
Collaborator Author

smklein commented Oct 24, 2025

Putting this in draft - I've identified a problem where this could delete a bundle that's still actively being collected. Will re-work before marking "Ready for Review".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support bundles creating prior to online update not accessible

1 participant