Skip to content

Conversation

@jgallagher
Copy link
Contributor

@jgallagher jgallagher commented Dec 4, 2024

This is a step on the road to #7078. The current BlueprintBuilder internals would not at all be amenable to squishing the disparate maps in Blueprint together, so this PR tries to rework those internals. BlueprintBuilder now does its own map squishing (inside new_based_on()), combining state+zones+disks+datasets into one helper (SledEditor). All modifications are preformed via those editors, then BlueprintBuilder::build() breaks the combined editors' results back out into four maps for Blueprint.

There should be only one functional change in this PR (marked with a TODO-john): previously, expunging a zone checked that that zone had not been modified in the current planning cycle. I'm not sure that's really necessary; it felt like some defensiveness born out of the complexity of the builder itself, maybe? I think we could put it back (inside ZonesEditor, presumably) if folks think it would be good to keep.

My intention was to change the tests as little as possible to ensure I didn't break anything as I was moving functionality around, so new_based_on() and build() have some arguably-pretty-gross concessions to behave exactly the way the builder did before these changes. I'd like to remove those concessions, but those will be nontrivial behavioral changes that I don't want to try to roll in with all of this cleanup. I think I landed this pretty well; there are only a few expectorate changes (due to the slightly reworked ZonesEditor producing a different ordering for a few tests), and none in omdb (which is where I've often seen incidental planner changes bubble out).

Apologies for the size of the PR. I'd lightly recommend that the new blueprint_editor/ module and its subcomponents should be pretty quickly reviewable and should be looked at first; they're supposed to be simple and obvious, and are largely ported over from the prior storage / disks / datasets / zones editors. I did rework how we're handling #6645 backwards compat to try to reduce how much we need to pass SledResources around, so that complicates things in DatasetsEditor some, but not too bad IMO. (I also added a test for this, since I changed it and I don't think we had one before.) The BlueprintBuilter changes should then be more or less the natural way one would use a collection of SledEditors without changing its API or behavior (yet!).

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an improvement to the current state of affairs. I believe it will help me cleanup my branch for making disk expungement and decommissioning explicit in blueprints.

You can take or leave my suggestions. I just want to merge this in and use it immediately.

self.state = new_state;
}

pub fn disks(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lovely. Thank you.

self.datasets.ensure(zone_root);
}

pub fn expunge_disk(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so nice to read and be exposed at a high level. Thank you.

.build(datasets, rng)
});
let durable_dataset = zone.zone_type.durable_dataset().map(|dataset| {
let address = match &zone.zone_type {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it crucible is the only durable dataset with an IP address. Maybe add a comment about why this is abnormal. That information should really be held outside the Dataset IMO, but no changes needed right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed and certainly; added in c6e2dcb

/// 2. Is the dataset in `preexisting_database_ids`? If so, use that ID.
/// 3. Generate a new random ID.
#[derive(Debug)]
pub(crate) struct PreexistingDatasetIds(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we make reference in the name that these dataset IDs are coming from the database?

Maybe something like DatasetIdsFromDatabase or DatasetIdsFromDbButNotInBlueprint. I know the latter is wordy, but it's also quite descriptive :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in 73045a7

}

#[derive(Debug)]
pub(crate) struct PartialDatasetConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please put a comment describing the purpose of this type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added along with the rework in bd66165

sled_state.entry(sled_id).or_insert(details.state);
// Helper to build a `PreexistingDatasetIds` for a given sled. This will
// go away with https://github.com/oxidecomputer/omicron/issues/6645.
let build_preexisting_dataset_ids = |sled_id| -> anyhow::Result<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change. Thanks.

{
Some(state) => state,
None => {
// If we have zones but no state for a sled, we assume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something it would really be nice to encode in the type system some how. A sled can only be moved to decommissioned if all it's zones, disks, and datasets are expunged. I'm not sure how feasible this is (certainly not in this PR), but it would be nice to get rid of checks like this and many others throughout the planner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried putting a check in SledEditor::set_state(); i.e., "if you're try to set the state to Decommissioned, the editor will fail if ... insert whatever checks we want", and it caused some test failures. Almost certainly those tests are taking invalid shortcuts; I'd like to take this as a followup.

);
}
}
// Preserving backwards compatibility, for now: prune sled_state of any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this backwards compatibility is going to have to change when decommissioning disks based on whether a sled is decommissioned, which is presumably why you wrote for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. I'd like to remove all of these "preserving backwards compatibility" blocks, as they all result in the four maps (potentially) having different sets of keys, which is confusing at best.

"for sled {sled_id}, error computing zones to expunge"
)))
})?;
// TODO-john we lost the check for "are we expunging a zone we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m not sure I really understand why that was there. If we mark a zone expunged in this iteration, why wouldn't we set the disposition to expunged in the editor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check was something else; it was saying "if we're about to mark a zone as expunged, first check that we didn't already edit this zone in some way in this same planning pass". I think it was to guard against the planner adding a zone and then immediately expunging it, or changing the parameters of a zone and then expunging it, or other nonsense operations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. I think it's probably fine to remove the check. We know when we want to expunge something which should trump everything else.

// https://github.com/oxidecomputer/omicron/issues/6645; for now, it
// confirms we maintain the compatibility layer it needs.
#[test]
fn test_backcompat_reuse_existing_database_dataset_ids() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test!

@andrewjstone
Copy link
Contributor

Thanks for all the cleanup @jgallagher. Looking good.

@jgallagher jgallagher enabled auto-merge (squash) December 5, 2024 22:57
@jgallagher jgallagher merged commit e73a30e into main Dec 5, 2024
17 checks passed
@jgallagher jgallagher deleted the john/combined-sled-editor branch December 5, 2024 23:12
jgallagher added a commit that referenced this pull request Dec 16, 2024
#7234)

This is a followup from
#7204 (comment)
and makes two changes, neither of which should affect behavior:

* `SledEditor` will now fail if a caller attempts to make changes to a
decommissioned sled (internally, this is statically enforced by a type
state enum - a sled in the decommissioned state does not have any
methods that support editing, so we're forced to return an error)
* `SledEditor::set_state()` is now `SledEditor::decommission()`, and it
performs some checks that the sled looks decommissionable

The second bullet is more questionable than I expected it to be:

1. There are some arguments that `SledEditor` shouldn't do any checks
here; in particular, it doesn't have the full context (e.g., any checks
on "should we decommission this sled" that depend on the `PlanningInput`
can't be performed here, because `SledEditor` intentionally doesn't have
access to `PlanningInput`).
2. I wanted to check zones + disks + datasets, but in practice it can
only check zones today; I left a comment (and the commented-out disks +
datasets checks we should do) about why. I think we will eventually be
able to turn these on; the current behavior of removing disks/datasets
from the blueprint for expunged sleds will have to change to fix #7078,
at which point these checks should be valid.

I don't feel super strongly about the checks in `decommission()` or even
this PR as a whole; if this doesn't look like a useful direction, I'd be
fine with discarding it. Please review with a pretty critical eye.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants