[nexus] Add part to service bundle ereport paths #8767
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #8739, I added code for collecting ereports into support bundles which stores the ereport JSON in directories for each sled/switch/PSC serial number from which an ereport was received. Unfortunately, I failed failed to consider that the version 1 Oxide serial numbers are only unique within the namespace of a particular part, and not globally --- so (for example) a switch and a compute sled may have colliding serials. This means that the current code could incorrectly group ereports reported by two totally different devices. While the ereport JSON files do contain additional information that disambiguates this (it includes includes the part number, as well as MGS metadata with the SP type for SP ereports), and restart IDs are additionally capable of distinguishing between reporters, putting ereports from two different systems within the same directory still has the potential to be quite misleading.
Thus, this branch changes the paths for ereports to include the part number as well as the serial number, in the format:
In order to include part numbers for host OS ereports, I decided to add a part number column to the
host_ereporttable as well. Initially, I had opted not to do this, as I was thinking that, sincehost_ereportincludes a sled UUID, we could just join with thesledtable to get the part number. However, it occurred to me that ereports may be received from a sled that's later expunged from the rack, and thesledrecord for the sled may eventually be deleted, so such a join would fail. We might retain such ereports past the lifetime of the sled in the rack. So, I thought it was better to always include the part number in the ereport record.I've added a migration that attempts to backfill the
host_ereport.part_numbercolumn from thesledtable for existing host OS ereport records. In practice, this won't do anything, since we're not collecting them yet,but it seemed nice to have. Sadly, the column had to be left nullable, since we may theoretically encounter an ereport with a sled UUID that points to an already-deleted sled record, but...whatever. Since there aren't currently any host OS ereport records anyway, this shouldn't happen, and we'll just handle the nullability; this isn't terrible as we must already do so for SP ereport records.Fixes #8765