[sled agent] API to manage datasets explicitly #6144

smklein · 2024-07-22T20:18:28Z

This PR exposes an API from the Sled Agent which allows Nexus to configure datasets independently from Zones.

Here's an example subset of zfs list -o name on a deployed system, with some annotations in-line

# This is the pool of an arbitrary U.2
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980

# Crucible has a dataset that isn't encrypted at the ZFS layer, because it's encrypted internally...
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crucible
# ... and it contains a lot of region datasets.
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crucible/regions/...

# We have a dataset which uses a trust-quorum-derived encryption key.
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt
# Durable datasets (e.g. Cockroach's) can be stored in here.
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt/cockroachdb
# The "debug" dataset has been historically created by + managed by the Sled Agent.
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt/debug
# Transient zone filesystems also exist here, and are encrypted.
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt/zone
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt/zone/oxz_cockroachdb_8bbea076-ff60-4330-8302-383e18140ef3
oxp_e12f29b8-1ab8-431e-bc96-1c1298947980/crypt/zone/oxz_crucible_a232eba2-e94f-4592-a5a6-ec23f9be3296

History

Prior to this PR, the sled agent exposed no interfaces to explicitly manage datasets on their own. Datasets could be created one of two ways:

Created and managed by the sled agent, without telling Nexus. See: the debug dataset.
Created in response to requests from Nexus to create zones. See: crucible, cockroachdb, and the zone filesystems above.

These APIs did not provide a significant amount of control over dataset usage, and provided no mechanism for setting quotas and reservations.

This PR

Expands Nexus' notion of "dataset kind" to include the following variants:
- zone_root, for the crypt/zone dataset,
- zone, for any dataset within crypt/zone (e.g., crypt/zone/oxz_cockroachdb_8bbea076-ff60-4330-8302-383e18140ef3).
- debug for the crypt/debug dataset.
Adds two endpoints to Sled Agent: datasets_put, and datasets_get, for setting a configuration of expected datasets. At the moment, datasets_put is purely additive, and does not remove any missing datasets.
- This API provides a mechanism for Nexus to manage quotas and reservations, which it will do in the future.

This PR is related to #6167, which provides additional tooling through the inventory for inspecting dataset state on deployed sleds.

Fixes #6042, #6107

smklein · 2024-07-29T18:49:22Z

common/src/api/internal/shared.rs

    InternalDns,
 }

+impl DatasetKind {


A chunk of this implementation was moved from sled-storage/src/dataset.rs.

History lesson: We used to have types defined that would be usable by Nexus (common/src/api/internal), and types that were internal to the sled agent (sled-storage/src/dataset.rs) for defining dataset types/kinds.

This PR merges both concepts, but tweaks some names to avoid changing any schemas that are saved on M.2s.

illumos-utils/src/zfs.rs

bnaecker

Overall this great! I've a few questions and suggestions, but the structure looks quite straightforward to me. Thanks!

bnaecker · 2024-08-01T17:07:35Z

common/src/api/internal/shared.rs

+    Hash,
 )]
-#[serde(rename_all = "snake_case")]
+#[serde(tag = "type", rename_all = "snake_case")]


Will adding this tag break any serialization? I'm thinking of code which attempts to deserialize these values, but which is not necessarily running against the same version of the code here.

I'm wondering if we should just always require tag in serde impls to ensure forward compatibility, in case any particular variant starts having associated data in the future. (The requirement to use a tag, if any variant has associated data, is imposed by the openapi tooling.)

I went down the route of "serialize/deserialize via string", as @sunshowers recommended later.

common/src/api/internal/shared.rs

bnaecker · 2024-08-01T17:10:53Z

common/src/api/internal/shared.rs

        let s = match self {
            Crucible => "crucible",
-            Cockroach => "cockroach",
+            Cockroach => "cockroachdb",


When is the Display implementation used and when is a value serialized? I'm wondering why this variant is "cockroachdb" here but "cockroach_db" when serialized.

+1, i would have a slight preference for the Display and Serialize strings being identical...

+1 as well. For ZoneKind, which is related, we already have 4 different string representations X_X:

omicron/nexus-sled-agent-shared/src/inventory.rs

Line 380 in 812c713

impl ZoneKind {

So my mishaps with naming here were intended to keep backwards compatibility between:

Names for cockroach we have in deployed systems (e.g., the zfs list-ed name is cockroachdb)

Names for cockroach we have stored on-disk in configuration files (e.g., grepping for cockroach_db in the schema directory of Omicron shows that it's used in e.g., all-zones-requests.json, which is used for bootstrapping "what zones get auto-launched when we reboot")

To be clear, I think I misinterpreted this in my original PR.

I was attempting to merge DatasetKind (from this file) and DatasetType (from sled-storage/src/dataset.rs) without breaking backwards compatibility. It was in the all-zones-requests.json configuration file where I saw the underscored "cockroach_db" name and tried to keep compatibility.

HOWEVER, upon closer inspection, the cockroach_db name actually comes from OmicronZoneType in nexus-sled-agent-shared/src/inventory.rs, so I should be able to just completely stick with the cockroachdb name (no underscores) in this DatasetKind variant.

Updated in 93134c2 to just use cockroachdb in serialize + display

nexus/db-model/src/dataset.rs

illumos-utils/src/zfs.rs

sled-storage/src/manager.rs

bnaecker · 2024-08-01T17:33:28Z

sled-storage/src/manager.rs

+
+        // The "crypt" dataset needs these details, but should already exist
+        // by the time we're creating datasets inside.
+        let encryption_details = None;


So None here does not mean "remove encryption"? Is there a way or a need to do that?

The Zfs::ensure_filesystem function probably needs some re-work, I agree that this is confusing...

No, this isn't necessary, at least not for this PR. We only supply a value here for "encryption roots" (e.g., the crypt root) which is still automatically created by sled agents.

All our encrypted filesystems used dataset names within the crypt/ dataset, and are implicitly encrypted.

See: DatasetName::full_name, which puts the dataset on crypt/ if it should be encrypted.

I'll add some extra docs to ensure_filesystem, in the meanwhile.

bnaecker · 2024-08-01T17:35:45Z

sled-storage/src/manager.rs

+        // Ensure the dataset has a usable UUID.
+        if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") {
+            if let Ok(id) = id_str.parse::<DatasetUuid>() {
+                if id != config.id {


How would this happen? We've asked to set a UUID, which succeeded previously, but somehow it didn't "take"? Related, is there a reason we don't just call Zfs::set_oxide_value() unconditionally?

I view it as more of a defensive guard against mis-matched configuration, or against "an old device/dataset was imported, and we want to be careful not to auto-import + overwrite it without some explicit intervention".

I don't think this is likely, but the name for this dataset is basically something like:

oxp_<pool UUID>/crucible

This name is not that unique, and doesn't really give a unique identifier that signifies this control plane is managing the disk.

By adding (and checking!) the UUID here, we ensure not only "this is a crucible dataset", but also, "this is our crucible dataset, and not one configured by someone else".

sled-storage/src/manager.rs

sunshowers · 2024-08-07T20:10:02Z

common/src/api/internal/shared.rs

+    Hash,
 )]
-#[serde(rename_all = "snake_case")]
+#[serde(tag = "type", rename_all = "snake_case")]


I'm wondering if we should just always require tag in serde impls to ensure forward compatibility, in case any particular variant starts having associated data in the future. (The requirement to use a tag, if any variant has associated data, is imposed by the openapi tooling.)

common/src/api/internal/shared.rs

sunshowers

Had a look at the db and reconfigurator bits, they generally look quite good. Thanks for doing this!

common/src/api/internal/shared.rs

sunshowers · 2024-08-07T23:04:46Z

common/src/disk.rs

+        self.generation > other.generation
+    }
+
+    // No need to do this, the generation number is provided externally.


Hmm as someone unfamiliar with this code I'm not qutie sure what it means here -- I see that there's a generation field. Do you mean it gets updated by whatever owns this config?

Yeah, the generation field exists in the config, so we don't actually need (nor want) to change anything in the implementation of Ledgerable::generation_bump.

The "Ledgerable" trait exists to help us write a file to many disks, and assure that we'll read back the latest one, if it exists. Part of this is deciphering: If we failed halfway through a write, and we see two distinct ledgers, which should we use?

There are some configurations (basically, in the bootstore only now) that want to bump the generation number on every single write, so this just tracks "what's the latest thing that was written".

However, in this case -- and many others, for Nexus-supplied data -- we already have a generation from Nexus, and we don't need to have a duplicate generation number for writing things to disk. As long as the data is unmodified from Nexus -> Sled Agent -> Disk, we can safely re-use this field.

nexus/db-model/src/dataset_kind.rs

nexus/db-model/src/dataset.rs

sunshowers · 2024-08-08T00:06:25Z

common/src/api/internal/shared.rs

+    Debug,
+}
+
+impl Serialize for DatasetKind {


Oh hmm, I think you'll also want to update the JsonSchema impl to match this.

Just to confirm -- what about the JsonSchema implementation should we update? It's currently being derived, and is used as part of internal APIs, but I don't think it needs to match the serialization + to/from string implementations.

Updated the PR with a fixed JSON schema.

Basically the issue was that JSON schemas are meant to be a description of the serialization format, and they were out of sync since we'd manually implemented Serialize and Deserialize. It was currently moot for us because we were using replace, but it's a lurking bug that can lead to issues down the road.

I've changed the schema to be just string, which is overly generic but is at least correct. A more sophisticated impl would be something like string with a validation regex that lists all the possibilities, but that seems like overkill given that we're using replace anyway.

smklein · 2024-08-08T19:21:56Z

Thank you everyone for the feedback! I think I've addressed all comments so far.

…-datasets

smklein · 2024-08-26T17:06:22Z

Hey all, I think I've addressed all feedback here, and rebased onto main as of 8/26.

If ya'll have a moment, I'd appreciate another look -- I have a few PRs downstream of this, and want to make sure this is a foundation we approve of.

papertigers

The changes here make sense to me. Only one minor nit that I could go either way on.

common/src/disk.rs

Provides visibility into deployed datasets, building atop #6144. Similar to Physical Disks (and Zpools), Datasets are observable in two ways: - The Sled Agent provides an API to manage their configuration ("this is what is intended") - The Sled Agent exposes information about datasets via inventory ("this is what exists") This PR implements the inventory aspect of datasets, to provide visibility into the state of sled storage. Additionally, this PR provides some omdb commands for inspection: - `omdb sled-agent datasets list` has been added to show dataset configuration. - `omdb db inventory collections show` has been updated to emit disk, zpool, and dataset info from inventory. --------- Co-authored-by: Rain <[email protected]>

smklein added 9 commits July 18, 2024 10:49

[wip] Starting sled agent API to manage datasets explicitly

5c2e5b0

Merge branch 'main' into explicit-datasets

1709d80

list implementation

a80313d

Tests

7193d1b

schemas n stuff

9464cc1

Merge branch 'main' into explicit-datasets

fb23666

Merge branch 'main' into explicit-datasets

29c4eb7

Merge branch 'main' into explicit-datasets

75f006b

Fix mismerge

24d93a8

smklein mentioned this pull request Jul 26, 2024

[inventory] Add Sled Agent datasets to inventory #6167

Merged

smklein added 2 commits July 26, 2024 13:35

Clippy and helios

178e20e

openapi

cf3f35c

smklein commented Jul 29, 2024

View reviewed changes

smklein mentioned this pull request Jul 29, 2024

"datasets ensure" API in sled agent should remove unmatched datasets (eventually) #6177

Closed

smklein added 4 commits July 29, 2024 18:37

More broad support for datasets

11f49fb

make it a schema change

b999bf7

Merge branch 'main' into explicit-datasets

fac6456

queries

fdf0644

smklein marked this pull request as ready for review July 30, 2024 18:12

smklein requested review from bnaecker, papertigers and sunshowers July 30, 2024 18:13

Merge branch 'main' into explicit-datasets

a042439

smklein commented Aug 1, 2024

View reviewed changes

illumos-utils/src/zfs.rs Show resolved Hide resolved

bnaecker reviewed Aug 1, 2024

View reviewed changes

smklein added 4 commits August 7, 2024 11:31

Merge branch 'main' into explicit-datasets

9c27213

Optional properties still get set

c562125

clippy

c7a5adf

fmt

181507c

sunshowers reviewed Aug 7, 2024

View reviewed changes

smklein added 4 commits August 7, 2024 13:35

str not string

8e1df07

cockroachdb, not cockroach_db

93134c2

generation numbers

d88d541

review feedback

95ebbb8

sunshowers reviewed Aug 7, 2024

View reviewed changes

Serialize via string, not tag

994757e

sunshowers reviewed Aug 8, 2024

View reviewed changes

safer API for Dataset::new

7156ba0

sunshowers and others added 5 commits August 8, 2024 15:23

merge main into branch, manually impl JsonSchema

14a2dc4

update schema JSONs, add replace

aa60254

Merge branch 'main' into explicit-datasets

67c5d7e

Merge remote-tracking branch 'origin/explicit-datasets' into explicit…

927efd1

…-datasets

Fix schema

3e14cc4

karencfv mentioned this pull request Aug 12, 2024

Add a new ClickhouseServer Omicron Zone #6297

Merged

smklein added 3 commits August 22, 2024 12:09

Merge branch 'main' into explicit-datasets

40f93ae

Fix clickhouseserver datasetkind serialization

077908d

Merge branch 'main' into explicit-datasets

a600494

papertigers approved these changes Aug 27, 2024

View reviewed changes

common/src/disk.rs Outdated Show resolved Hide resolved

smklein added 4 commits August 27, 2024 17:08

Merge branch 'main' into explicit-datasets

0a63161

Make CompressionAlgorithm strongly typed

c9f170e

Fixing helios-only tests, clippy

f5bc35a

schemas

474ec13

smklein enabled auto-merge (squash) August 28, 2024 18:02

smklein merged commit 648507d into main Aug 28, 2024

smklein deleted the explicit-datasets branch August 28, 2024 18:27

[sled agent] API to manage datasets explicitly #6144

[sled agent] API to manage datasets explicitly #6144

Uh oh!

Conversation

smklein commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

History

This PR

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bnaecker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smklein Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sunshowers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smklein commented Aug 8, 2024

Uh oh!

smklein commented Aug 26, 2024

Uh oh!

papertigers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

smklein commented Jul 22, 2024 •

edited

Loading

smklein Aug 7, 2024 •

edited

Loading