- 
                Notifications
    You must be signed in to change notification settings 
- Fork 59
Local storage [1/4]: add local_storage dataset #9264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
To implement RFD 590, Propolis will use a delegated ZVOL as a file backed NVMe device. This PR creates a new "local_storage" dataset that can be used for this purpose on every zpool in the rack. This is created under the `crypt` parent so that the delegated storage is encrypted at rest. This is done with the planner instead of adding to U2_EXPECTED_DATASETS, and this required changing a bunch of tests that were assuming a specific number of datasets per pool, along with a whole whack of tests that use expectorate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't gone over all the expectorate output yet, but just a few comments from the code. (I assume all of the expectorate output is pretty noisy and not useful - there's now an extra dataset for every disk, which shifts all the output and IDs around. But I'd still like to skim through it.)
| // LocalStorage isn't in the U2_EXPECTED_DATASETS list, add it | ||
| // here. XXX does RSS blueprint have to be blippy clean? Isn't | ||
| // the plan to RSS only enough to run Nexus and have it use | ||
| // blueprints to make the rest true? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXX does RSS blueprint have to be blippy clean?
Yes - blippy should only complain about blueprints that should not exist and may lead to some kind of internal inconsistencies, I think?
Isn't the plan to RSS only enough to run Nexus and have it use blueprints to make the rest true?
That is one possible plan for a future where we've rewritten RSS. 😅 Today, we expect the Nexus takeover after RSS to find that it does not need to make any changes to the current target blueprint (and I think we even have a test for that somewhere - @davepacheco wrote it IIRC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 rewrote this comment in e8b0d00
        
          
                schema/crdb/dbinit.sql
              
                Outdated
          
        
      | ) WHERE | ||
| time_deleted IS NULL; | ||
|  | ||
| CREATE TABLE IF NOT EXISTS omicron.public.local_storage_dataset ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should we prefix this table name with rendezvous_? The existing ones dataset-related rendezvous tables are rendezvous_debug_dataset and crucible_dataset; IIRC the latter doesn't have a rendezvous prefix because it really predates the idea, and it seemed silly to rename it just for that.
Less of a nit: I would naively expect the structure of this table to look slightly more like rendezvous_debug_dataset instead of crucible_dataset; the former has more goop for handling deletion, for example, and doesn't have an rcgen (which I don't think makes sense for reconfigurator-managed resources like this?). I could also imagine a version of this PR that doesn't add this table at all, I think? Add it to the blueprint -> sled-agent path, but not do any bookkeeping in Nexus, and add that in separately later? (Not suggesting you actually do this necessarily, unless it would make things easier.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, done in 70ccf19
| use nexus_db_schema::schema::crucible_dataset::dsl; | ||
| let zpool_id = dataset.pool_id(); | ||
| Zpool::insert_resource( | ||
| let _: CrucibleDataset = Zpool::insert_resource( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we skipping inserting the local dataset records here because we expect blueprint execution to come along and do it for us? (I'm 100% on board with that, if it's right - just making sure I understand the intent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in keeping with the idea that rendezvous tables are only managed through blueprint execution. I think CrucibleDataset might be an exception to this? I can't imagine removing this insert would result in them not being populated later automatically but didn't want to remove it with everything else going on in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I left them here when we did the initial rendezvous table rework for similar reasoning - it's probably not necessary to insert them during RSS handoff, but also is harmless, and testing that it's really okay to remove them would be tedious.
they're both non-legacy rendezvous datasets, so they should be managed the same
To implement RFD 590, Propolis will use a delegated ZVOL as a file backed NVMe device. This PR creates a new "local_storage" dataset that can be used for this purpose on every zpool in the rack. This is created under the
cryptparent so that the delegated storage is encrypted at rest.This is done with the planner instead of adding to U2_EXPECTED_DATASETS, and this required changing a bunch of tests that were assuming a specific number of datasets per pool, along with a whole whack of tests that use expectorate.