-
Notifications
You must be signed in to change notification settings - Fork 62
[sled agent] API to manage datasets explicitly #6144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| InternalDns, | ||
| } | ||
|
|
||
| impl DatasetKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A chunk of this implementation was moved from sled-storage/src/dataset.rs.
History lesson: We used to have types defined that would be usable by Nexus (common/src/api/internal), and types that were internal to the sled agent (sled-storage/src/dataset.rs) for defining dataset types/kinds.
This PR merges both concepts, but tweaks some names to avoid changing any schemas that are saved on M.2s.
bnaecker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this great! I've a few questions and suggestions, but the structure looks quite straightforward to me. Thanks!
common/src/api/internal/shared.rs
Outdated
| Hash, | ||
| )] | ||
| #[serde(rename_all = "snake_case")] | ||
| #[serde(tag = "type", rename_all = "snake_case")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adding this tag break any serialization? I'm thinking of code which attempts to deserialize these values, but which is not necessarily running against the same version of the code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should just always require tag in serde impls to ensure forward compatibility, in case any particular variant starts having associated data in the future. (The requirement to use a tag, if any variant has associated data, is imposed by the openapi tooling.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went down the route of "serialize/deserialize via string", as @sunshowers recommended later.
| let s = match self { | ||
| Crucible => "crucible", | ||
| Cockroach => "cockroach", | ||
| Cockroach => "cockroachdb", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is the Display implementation used and when is a value serialized? I'm wondering why this variant is "cockroachdb" here but "cockroach_db" when serialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, i would have a slight preference for the Display and Serialize strings being identical...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 as well. For ZoneKind, which is related, we already have 4 different string representations X_X:
| impl ZoneKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my mishaps with naming here were intended to keep backwards compatibility between:
- Names for cockroach we have in deployed systems (e.g., the
zfs list-ed name iscockroachdb) - Names for cockroach we have stored on-disk in configuration files (e.g., grepping for
cockroach_dbin theschemadirectory of Omicron shows that it's used in e.g.,all-zones-requests.json, which is used for bootstrapping "what zones get auto-launched when we reboot")
To be clear, I think I misinterpreted this in my original PR.
I was attempting to merge DatasetKind (from this file) and DatasetType (from sled-storage/src/dataset.rs) without breaking backwards compatibility. It was in the all-zones-requests.json configuration file where I saw the underscored "cockroach_db" name and tried to keep compatibility.
HOWEVER, upon closer inspection, the cockroach_db name actually comes from OmicronZoneType in nexus-sled-agent-shared/src/inventory.rs, so I should be able to just completely stick with the cockroachdb name (no underscores) in this DatasetKind variant.
Updated in 93134c2 to just use cockroachdb in serialize + display
|
|
||
| // The "crypt" dataset needs these details, but should already exist | ||
| // by the time we're creating datasets inside. | ||
| let encryption_details = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So None here does not mean "remove encryption"? Is there a way or a need to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Zfs::ensure_filesystem function probably needs some re-work, I agree that this is confusing...
No, this isn't necessary, at least not for this PR. We only supply a value here for "encryption roots" (e.g., the crypt root) which is still automatically created by sled agents.
All our encrypted filesystems used dataset names within the crypt/ dataset, and are implicitly encrypted.
See: DatasetName::full_name, which puts the dataset on crypt/ if it should be encrypted.
I'll add some extra docs to ensure_filesystem, in the meanwhile.
| // Ensure the dataset has a usable UUID. | ||
| if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") { | ||
| if let Ok(id) = id_str.parse::<DatasetUuid>() { | ||
| if id != config.id { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this happen? We've asked to set a UUID, which succeeded previously, but somehow it didn't "take"? Related, is there a reason we don't just call Zfs::set_oxide_value() unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view it as more of a defensive guard against mis-matched configuration, or against "an old device/dataset was imported, and we want to be careful not to auto-import + overwrite it without some explicit intervention".
I don't think this is likely, but the name for this dataset is basically something like:
oxp_<pool UUID>/crucible
This name is not that unique, and doesn't really give a unique identifier that signifies this control plane is managing the disk.
By adding (and checking!) the UUID here, we ensure not only "this is a crucible dataset", but also, "this is our crucible dataset, and not one configured by someone else".
common/src/api/internal/shared.rs
Outdated
| Hash, | ||
| )] | ||
| #[serde(rename_all = "snake_case")] | ||
| #[serde(tag = "type", rename_all = "snake_case")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should just always require tag in serde impls to ensure forward compatibility, in case any particular variant starts having associated data in the future. (The requirement to use a tag, if any variant has associated data, is imposed by the openapi tooling.)
sunshowers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look at the db and reconfigurator bits, they generally look quite good. Thanks for doing this!
| self.generation > other.generation | ||
| } | ||
|
|
||
| // No need to do this, the generation number is provided externally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm as someone unfamiliar with this code I'm not qutie sure what it means here -- I see that there's a generation field. Do you mean it gets updated by whatever owns this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the generation field exists in the config, so we don't actually need (nor want) to change anything in the implementation of Ledgerable::generation_bump.
The "Ledgerable" trait exists to help us write a file to many disks, and assure that we'll read back the latest one, if it exists. Part of this is deciphering: If we failed halfway through a write, and we see two distinct ledgers, which should we use?
There are some configurations (basically, in the bootstore only now) that want to bump the generation number on every single write, so this just tracks "what's the latest thing that was written".
However, in this case -- and many others, for Nexus-supplied data -- we already have a generation from Nexus, and we don't need to have a duplicate generation number for writing things to disk. As long as the data is unmodified from Nexus -> Sled Agent -> Disk, we can safely re-use this field.
| Debug, | ||
| } | ||
|
|
||
| impl Serialize for DatasetKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm, I think you'll also want to update the JsonSchema impl to match this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm -- what about the JsonSchema implementation should we update? It's currently being derived, and is used as part of internal APIs, but I don't think it needs to match the serialization + to/from string implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR with a fixed JSON schema.
Basically the issue was that JSON schemas are meant to be a description of the serialization format, and they were out of sync since we'd manually implemented Serialize and Deserialize. It was currently moot for us because we were using replace, but it's a lurking bug that can lead to issues down the road.
I've changed the schema to be just string, which is overly generic but is at least correct. A more sophisticated impl would be something like string with a validation regex that lists all the possibilities, but that seems like overkill given that we're using replace anyway.
|
Thank you everyone for the feedback! I think I've addressed all comments so far. |
|
Hey all, I think I've addressed all feedback here, and rebased onto If ya'll have a moment, I'd appreciate another look -- I have a few PRs downstream of this, and want to make sure this is a foundation we approve of. |
papertigers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here make sense to me. Only one minor nit that I could go either way on.
Provides visibility into deployed datasets, building atop #6144. Similar to Physical Disks (and Zpools), Datasets are observable in two ways: - The Sled Agent provides an API to manage their configuration ("this is what is intended") - The Sled Agent exposes information about datasets via inventory ("this is what exists") This PR implements the inventory aspect of datasets, to provide visibility into the state of sled storage. Additionally, this PR provides some omdb commands for inspection: - `omdb sled-agent datasets list` has been added to show dataset configuration. - `omdb db inventory collections show` has been updated to emit disk, zpool, and dataset info from inventory. --------- Co-authored-by: Rain <[email protected]>
This PR exposes an API from the Sled Agent which allows Nexus to configure datasets independently from Zones.
Here's an example subset of
zfs list -o nameon a deployed system, with some annotations in-lineHistory
Prior to this PR, the sled agent exposed no interfaces to explicitly manage datasets on their own. Datasets could be created one of two ways:
debugdataset.crucible,cockroachdb, and thezonefilesystems above.These APIs did not provide a significant amount of control over dataset usage, and provided no mechanism for setting quotas and reservations.
This PR
zone_root, for thecrypt/zonedataset,zone, for any dataset withincrypt/zone(e.g.,crypt/zone/oxz_cockroachdb_8bbea076-ff60-4330-8302-383e18140ef3).debugfor thecrypt/debugdataset.datasets_put, anddatasets_get, for setting a configuration of expected datasets. At the moment,datasets_putis purely additive, and does not remove any missing datasets.This PR is related to #6167, which provides additional tooling through the inventory for inspecting dataset state on deployed sleds.
Fixes #6042, #6107