-
Notifications
You must be signed in to change notification settings - Fork 59
[support bundles] Don't fail already-collected bundles #9267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
df95650 to
82d8b14
Compare
| .await?; | ||
|
|
||
| // Find all bundles on nexuses that no longer exist. | ||
| // Find all collecting bundles on nexuses that no longer exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In-progress" bundles get marked failed if their "owning Nexus" dies.
Otherwise: there is nothing to mark failed, when a Nexus gets expunged.
| opctx, | ||
| &pagparams, | ||
| self.nexus_id, | ||
| None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old: Each nexus queries their own set of bundles to see what should be destroyed.
New: All nexuses query any bundles in these states, and try to clean them up concurrently.
This does mean bundle deletion may happen concurrently, from multiple distinct Nexuses, since there is no real meaning of "ownership" after collection has completed.
I think this should be safe - the process of deleting a bundle involves:
- Delete from the sled where it's stored
- Delete from the database, updating the record
Both of which should be independently safe from concurrent Nexuses
| @@ -0,0 +1,4 @@ | |||
| CREATE INDEX IF NOT EXISTS lookup_bundle_by_state ON omicron.public.support_bundle ( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're indexing by state, not by nexus ID, with this new PR
|
Putting this in draft - I've identified a problem where this could delete a bundle that's still actively being collected. Will re-work before marking "Ready for Review". |
If a Nexus is expunged after collection is complete, don't fail the bundle. Leave it active.
Previously, the upgrade pathway (involving expunging and adding new Nexuses) would cause all support bundles to be marked "failed". For fully-collected bundles, this is not necessary, and fixed by this PR.
Fixes #9257