-
Notifications
You must be signed in to change notification settings - Fork 62
Config property for target system software release #7518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| polar_snippet = FleetChild, | ||
| } | ||
|
|
||
| authz_resource! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm defining this authz resource, but am not convinced that I'm using it right (or at all). Do these only apply to resources that use the lookup* macros? Any guidance here would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the target blueprint as an analog here because it's similarly global.
It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.
That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, replaced with a synthetic resource in 5fe4bb3.
|
|
||
| /// Use the specified release of the rack's system software. | ||
| SystemVersion(SemverVersion), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this into nexus/types/src/external_api/shared.rs since you're using it in both views and params. Silly, but that's the scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, moved in fe23073.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and moved back again in 87b2000, since we no longer take a view as a param.
nexus/external-api/src/lib.rs
Outdated
| method = GET, | ||
| path = "/v1/system/update/target-release", | ||
| tags = ["system/update"], | ||
| unpublished = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind writing a comment as to why these two are unpublished? If there's a related GH issue, would you include it in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no good reason, really; I just copied the API parameters from the other /v1/system/update endpoints (TUF repo depot). I can certainly mark these as published if we're happy with them; @iliana should we also publish the repository endpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to publish the updates endpoints right now, they don't work unless you have configured Nexus in a special way. I don't know what our general policy is on endpoints that customers shouldn't/can't poke at yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a hidden tag. I use it to skip those endpoints when generating the Go SDK (not sure about the rust or typescript one), but it would still be available in the API. Perhaps we should leave these unpublished and leave a comment explaining why.
Thoughts? @david-crespo @ahl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to merge these PRs but these endpoints definitely won't do anything for customers in the next release, I would lean toward unpublished. If we want our own clients (most likely the CLI) to be able to use them, we need them in the OpenAPI schema but can use hidden to keep them out of the docs. Sounds like the Go SDK leaves out hidden endpoints but I don't think that's true of all clients (e.g., the TS client generator does produce methods for hidden endpoints).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs for hidden:
"hidden" = {
description = "TODO operations that will not ship to customers",
external_docs = {
url = "http://docs.oxide.computer/api"
}
},
Seems like this initial definition has shifted significantly.
This area could probably use some thought / love beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is (being unpublished), we still have the problems that:
- the coverage test doesn't test this (I guess not a blocker exactly but not great -- maybe we should file a blocker issue to fix it)
- the CLI won't be generated for it, which means it will be hard to use this even in development, right?
From what @david-crespo said, using hidden here will fix both of these. I can't tell if this is consistent with the description @ahl pasted or not. This is sort of a TODO operation, it's not currently intended to be shipped to customers, but it eventually will be.
I guess there could be some fear that a customer using the CLI will actually set this? (A customer using curl could accidentally set it as-is, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried:
- tags = ["system/update"],
- unpublished = true,
+ tags = ["hidden"], // "system/update"on all the system/update endpoints, but got a failure I don't understand in check::tests::check_apis_up_to_date:
Failure [ 9/14] nexus.json (Oxide Region API v20250212.0.0) OpenAPI document validation failed
-> Rust documentation found in external interface: An identifier for an artifact.
The kind is [`ArtifactKind`], indicating that it might represent an artifact whose kind is unknown.
For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
Problem with type TargetRelease: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
this schema's type
String(
StringType {
format: Empty,
pattern: None,
enumeration: [
Some(
"unspecified",
),
],
min_length: None,
max_length: None,
},
)
differs from this
Object(
ObjectType {
properties: {
"system_version": Item(
Schema {
schema_data: SchemaData {
nullable: false,
read_only: false,
write_only: false,
deprecated: false,
external_docs: None,
example: None,
title: None,
description: None,
discriminator: None,
default: None,
extensions: {},
},
schema_kind: Type(
String(
StringType {
format: Empty,
pattern: Some(
"^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
),
enumeration: [],
min_length: None,
max_length: None,
},
),
),
},
),
},
required: [
"system_version",
],
additional_properties: Some(
Any(
false,
),
),
min_properties: None,
max_properties: None,
},
)
For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch
Problem with type TargetReleaseSource: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
this schema's type
String(
StringType {
format: Empty,
pattern: None,
enumeration: [
Some(
"unspecified",
),
],
min_length: None,
max_length: None,
},
)
differs from this
Object(
ObjectType {
properties: {
"system_version": Item(
Schema {
schema_data: SchemaData {
nullable: false,
read_only: false,
write_only: false,
deprecated: false,
external_docs: None,
example: None,
title: None,
description: None,
discriminator: None,
default: None,
extensions: {},
},
schema_kind: Type(
String(
StringType {
format: Empty,
pattern: Some(
"^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
),
enumeration: [],
min_length: None,
max_length: None,
},
),
),
},
),
},
required: [
"system_version",
],
additional_properties: Some(
Any(
false,
),
),
min_properties: None,
max_properties: None,
},
)
For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch
Rust documentation found in external interface: Metadata about an individual TUF artifact.
Found within a [`TufRepoDescription`].
For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
Rust documentation found in external interface: Status of a TUF repo import.
Part of [`TufRepoInsertResponse`].
For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
Rust documentation found in external interface: Metadata about a TUF repository.
Found within a [`TufRepoDescription`].
For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I tried:
- tags = ["system/update"], - unpublished = true, + tags = ["hidden"], // "system/update"on all the
system/updateendpoints, but got a failure I don't understand incheck::tests::check_apis_up_to_date:
These are errors from our openapi-lint -- self-imposed conventions on the API.
Failure [ 9/14] nexus.json (Oxide Region API v20250212.0.0) OpenAPI document validation failed -> Rust documentation found in external interface: An identifier for an artifact. The kind is [`ArtifactKind`], indicating that it might represent an artifact whose kind is unknown. For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
This one is saying that the Rustdoc for some type (that's apparently newly part of the external API with this change) appears to look like Rust-specific documentation. But we use this documentation for external users, so that'd be confusing for them. The link says:
As such, it's easy to accidentally allow internally-relevant documentation leak out as externally-visible in the OpenAPI document. It's not possible to simply infer this from text alone, but we do look for shibboleths such as a Rust path delimeter (::) and bracketed expressions with no subsequent parentheses (title being reasonable).
So I'm guessing it's the link to ArtifactKind that doesn't have a URL in it. Rustdoc does the right thing, but the API doc generator wouldn't.
Problem with type TargetRelease: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging. this schema's type String( StringType { format: Empty, pattern: None, enumeration: [ Some( "unspecified", ), ], min_length: None, max_length: None, }, ) differs from this Object( ObjectType { properties: { "system_version": Item( Schema { schema_data: SchemaData { nullable: false, read_only: false, write_only: false, deprecated: false, external_docs: None, example: None, title: None, description: None, discriminator: None, default: None, extensions: {}, }, schema_kind: Type( String( StringType { format: Empty, pattern: Some( "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$", ), enumeration: [], min_length: None, max_length: None, }, ), ), }, ), }, required: [ "system_version", ], additional_properties: Some( Any( false, ), ), min_properties: None, max_properties: None, }, ) For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch
This one is basically saying that our enums are supposed to be adjacently-tagged. There's more detail in that link. I think you need to use #[serde(tag...)] on TargetReleaseSource.
Problem with type TargetReleaseSource: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging. this schema's type String( StringType { format: Empty, pattern: None, enumeration: [ Some( "unspecified", ), ], min_length: None, max_length: None, }, ) differs from this Object( ObjectType { properties: { "system_version": Item( Schema { schema_data: SchemaData { nullable: false, read_only: false, write_only: false, deprecated: false, external_docs: None, example: None, title: None, description: None, discriminator: None, default: None, extensions: {}, }, schema_kind: Type( String( StringType { format: Empty, pattern: Some( "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$", ), enumeration: [], min_length: None, max_length: None, }, ), ), }, ), }, required: [ "system_version", ], additional_properties: Some( Any( false, ), ), min_properties: None, max_properties: None, }, ) For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch
I think this is the same issue being reported directly on TargetReleaseSource. Before, it was being reported on TargetRelease because it contains a TargetReleaseSource.
Rust documentation found in external interface: Metadata about an individual TUF artifact. Found within a [`TufRepoDescription`]. For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
This is the same as the first one: I infer that you can't use the [Identifier] syntax in Rustdoc that's being used as API documentation.
Rust documentation found in external interface: Status of a TUF repo import. Part of [`TufRepoInsertResponse`]. For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
Same.
Rust documentation found in external interface: Metadata about a TUF repository. Found within a [`TufRepoDescription`]. For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation
Same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for decoding that for me. Published with the hidden tag and required doc/representation changes in 25a88b0.
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This is looking good and it's going to be key structure for next steps.
My biggest questions here are the two in nexus/types/src/external_api/shared.rs.
schema/crdb/dbinit.sql
Outdated
| 'system_version' | ||
| ); | ||
|
|
||
| -- The software release that should be deployed to the rack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this structured like bp_target, where it's a list of all previous configurations, and only the one with the latest generation number matters? If so I think it'd be useful to document that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, documented in 80f690.
schema/crdb/dbinit.sql
Outdated
| generation INT8 NOT NULL PRIMARY KEY, | ||
| time_requested TIMESTAMPTZ NOT NULL, | ||
| release_source omicron.public.target_release_source NOT NULL, | ||
| system_version STRING(64), -- "foreign key" into the `tuf_repo` table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary key of tuf_repo is a uuid. Should this be that? Or is it supposed to match the sha256 field of tuf_repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, as discussed 80f69029c immediately resolves the version to a TUF repo UUID.
schema/crdb/dbinit.sql
Outdated
| release_source, | ||
| system_version | ||
| ) VALUES ( | ||
| 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nitty but I think I'd use 1 here just for consistency with other places we've used similar numbers (like OmicronZonesConfig generation and bp_target version).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, wasn't sure what the right starting generation was. Fixed in 80f6902.
| InstallDataset, | ||
|
|
||
| /// Use the specified release of the rack's system software. | ||
| SystemVersion(SemverVersion), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the external API, I'd suggest we use either a TUF repo id or its SHA256 rather than a semver. I get that the human wants to know something more like the semver (although even then, I think the descriptor for them is an opaque token -- it could as well be "2025 Q1"). But I think it will be simpler and clearer to just pick one of the TUF repo's identifiers here so that it's very obvious what's going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed 80f6902, immediately resolves the version to a TUF repo UUID. The user specifies the version, which must have a corresponding TUF repo already loaded.
| } | ||
| } | ||
|
|
||
| /// The source of the target release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the target release for the whole system, right? From an operator's perspective, I think the InstallDataset variant would be better called "LastMupdate".
I wasn't thinking about providing this as an option. I can see we need a value to represent "it hasn't been set yet" and we might want to let people reset it to that. But what are the actual semantics of setting it to this value? Let's say the value was set to SystemVersion and Reconfigurator has updated a few zones and now somebody comes and sets this to InstallDataset/LastMupdate. Does Reconfigurator actually undo the changes it made? I'd be inclined to say no, if you want that, you need to go mupdate again. In that case though this isn't really setting the target release to "last mupdate" or "a specific version", it's setting or unsetting a target release. Maybe this should just be an Option<SystemVersion> ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
87b2000 makes the version mandatory, and no longer allows setting the (renamed) Unspecified source variant.
nexus/external-api/src/lib.rs
Outdated
| method = GET, | ||
| path = "/v1/system/update/target-release", | ||
| tags = ["system/update"], | ||
| unpublished = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this being unpublished is also preventing the coverage test from finding it, which seems bad.
| /// rack should eventually correspond to the release described here. | ||
| #[endpoint { | ||
| method = GET, | ||
| path = "/v1/system/update/target-release", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making this /v1/system/target-release and calling the operation target_release_get (or view)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about dropping the update in the URL path? I'm not sure why that feels not quite right to me. I don't feel strongly about it.
| self.update_tuf_repo_get(opctx, system_version) | ||
| .await | ||
| .map_err(|e| err.bail(e))? | ||
| .repo | ||
| .system_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to panic here, right?
If we were doing this in raw SQL I might do:
INSERT INTO target_release (id, ...) VALUES (SELECT id FROM tuf_repo WHERE system_version = ...), ...)This would avoid an interactive transaction and reduce contention a lot but I guess it's fairly painful to do in Diesel. We could also fetch the id outside of the transaction but I suppose we still have to check that it's still there so it doesn't help much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved this resolution to the app layer in 80f6902.
common/src/api/external/mod.rs
Outdated
| Oximeter, | ||
| MetricProducer, | ||
| RoleBuiltin, | ||
| TargetRelease, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this get used? Is it that the authz_resource! macro expected it?
I wouldn't really expect this to be needed because I think it's mainly used for generating 404s and you can't get a 404 on a target release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, fixed and replaced with a synthetic resource in 5fe4bb3.
| polar_snippet = FleetChild, | ||
| } | ||
|
|
||
| authz_resource! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at the target blueprint as an analog here because it's similarly global.
It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.
That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)
17fd077 to
fd908da
Compare
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
nexus/db-model/src/target_release.rs
Outdated
| release_source: TargetReleaseSource, | ||
| tuf_repo_id: Option<DbTypedUuid<TufRepoKind>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me that the db model type has separate fields for the enum (unspecified or system_version) and the TUF repo id. Invalid combinations are possible but that's also possible in the database representation (but shouldn't happen because of the constraint).
But I would expect that above this (at the app layer) we'd have a more type-safe representation, like an enum with two variants where the TUF repo id is data associated with the SystemVersion variant. Then we'd accept that here.
As an example, I'm looking at InvOmicronZone in nexus/db-model/src/inventory.rs. That type also has a bunch of different Option values. There are lots of constraints on exactly how they can be set based on what kind of zone this represents. But the constructor InvOmicronZone::new() accepts the more type-safe thing (OmicronZoneConfig). There's also a fallible into_omicron_zone_config() to go back to the more type-safe version that we use elsewhere in the system. We could have something like that here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The app layer only sets a system version now, so 5898757 replaces these with type-safe constructors.
| // Fetch the row in the `target_release` table with the largest | ||
| // generation number. The subquery accesses the same table, so we | ||
| // have to make an alias to not confuse diesel. | ||
| let target_release_2 = | ||
| diesel::alias!(target_release as target_release_2); | ||
| dsl::target_release | ||
| .filter(dsl::generation.nullable().eq_any(target_release_2.select( | ||
| diesel::dsl::max(target_release_2.field(dsl::generation)), | ||
| ))) | ||
| .select(TargetRelease::as_select()) | ||
| .first_async(&*conn) | ||
| .await | ||
| .map_err(|e| public_error_from_diesel(e, ErrorHandler::Server)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much this matters but I would think you could do something like we do in blueprint_current_target_only():
dsl::target_release
.order_by(dsl::generation.desc())
.first_async()This would avoid an extra subquery (and the need to use an alias).
Separately: what does this do when the table is empty? (What do we want it to do? I see that blueprint_current_target_only() explicitly allows this to be optional but then converts None to a specific error. I'm not sure if we want to make this a similar sort of 500 or return a (fake?) initial target release of "unspecified".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's much simpler without the subquery! Both issues should be addressed in e7181db.
| self.transaction_retry_wrapper("set_target_release") | ||
| .transaction(&conn, |conn| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a single statement, I think it's better here to skip the explicit transaction.
If you skip self.transaction_retry_wrapper(...).transaction() altogether, then we'll just send this one statement to the database. That's still treated as a transaction. It's pretty unlikely to conflict with anything, but if it does, the database can retry it automatically. This is generally better than a client-side retry because the latency is much smaller and the window for another conflict is much shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, also done in e7181db.
| opctx: &OpContext, | ||
| target_release: &TargetRelease, | ||
| ) -> LookupResult<views::TargetRelease> { | ||
| opctx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this makes any difference but I figured I'd mention it: I think in other places (like InvOmicronZone I mentioned elsewhere), we've used a pattern where:
- you'd have a
TargetRelease::into_external(self, system_version: SystemVersion) -> Result<views::TargetRelease, ?> - this function would fetch the system version like it does, but then use
target_release.into_external(system_version)
Like I said, I'm not sure it makes any difference. I kind of like that the conversion to/from the external type could both live with the db type rather than in this logic here but 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, refactored in 27b7e8d.
| return Err(Error::invalid_request( | ||
| "missing TUF repo ID for specified system version", | ||
| )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, is this likely a case where the system version is currently set to a TUF repo that we don't actually have? If that's the case, I feel like this should be a 500 (server error), not a 400 -- we shouldn't allow this to happen. (It's not this PR's job to prevent it, to be clear.)
But if this could be called with system version's that aren't the current one, where we might reasonably have deleted the old TUF repo, then maybe it needs to return an error type that would allow the caller to decide whether it's a 500 or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in 8e33014.
| /// rack should eventually correspond to the release described here. | ||
| #[endpoint { | ||
| method = GET, | ||
| path = "/v1/system/update/target-release", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about dropping the update in the URL path? I'm not sure why that feels not quite right to me. I don't feel strongly about it.
nexus/external-api/src/lib.rs
Outdated
| /// a goal state for the rack's software, and attempt to asynchronously | ||
| /// update to that release. | ||
| #[endpoint { | ||
| method = POST, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not PUT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only my lack of understanding the difference, fixed in a709786.
| release_source omicron.public.target_release_source NOT NULL, | ||
| tuf_repo_id UUID, -- "foreign key" into the `tuf_repo` table | ||
| CONSTRAINT tuf_repo_for_system_version CHECK ( | ||
| (release_source != 'system_version') OR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want another side of this constraint for saying that tuf_repo_id IS NULL if release_source != system_version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further constrained in 1a83b85.
nexus/external-api/src/lib.rs
Outdated
| method = GET, | ||
| path = "/v1/system/update/target-release", | ||
| tags = ["system/update"], | ||
| unpublished = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is (being unpublished), we still have the problems that:
- the coverage test doesn't test this (I guess not a blocker exactly but not great -- maybe we should file a blocker issue to fix it)
- the CLI won't be generated for it, which means it will be hard to use this even in development, right?
From what @david-crespo said, using hidden here will fix both of these. I can't tell if this is consistent with the description @ahl pasted or not. This is sort of a TODO operation, it's not currently intended to be shipped to customers, but it eventually will be.
I guess there could be some fear that a customer using the CLI will actually set this? (A customer using curl could accidentally set it as-is, though.)
| // with some (very small) fuzz allowed in the timestamp reported | ||
| // by the database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry I just noticed this.) As I think you found, the problem here is that the database only stores microsecond precision but our in-memory timestamps have nanosecond precision and these get silently truncated by a round-trip through the database. Could you update this comment to be more specific that we're dealing with a database truncation issue? I'm afraid a reader might think the "fuzz" here has to do with timing and in that case this test would likely be flaky.
(In some other places, we deal with this by creating the timestamp without nanoseconds part:
https://github.com/oxidecomputer/omicron/blob/config-target-release/nexus/inventory/src/builder.rs#L571-L582
but what you've got here works too!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Np, clarified in 09fd2f3.
| TargetReleaseSource::SystemVersion | ||
| ); | ||
| assert_eq!(target_release.system_version, Some(version)); | ||
| assert_eq!(target_release.tuf_repo_id, Some(tuf_repo_id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a test that if you try to insert a row with the same generation as an existing one, it fails? (Sorry if I missed that here.)
I'm thinking maybe take one that's in the database, and do target_release.new_system_version() or target_release.new_unspecified() twice (from the same initial target_release) and try to insert both and make sure one fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that test but removed it when I switched to the new constructors; I didn't think of making two! Restored in 09fd2f3.
| use crate::db::schema::tuf_repo; | ||
| if let Some(tuf_repo_id) = target_release.tuf_repo_id { | ||
| views::TargetReleaseSource::SystemVersion { | ||
| version: tuf_repo::table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm realizing as I read this that it's conceivable that a target release doesn't have a corresponding TUF repo any more because it's been deleted. If we don't allow that, then we'd never be able to delete a TUF repo that was ever made the target release.
One option would be to denormalize this so that the version is also directly in the target_release table.
Maybe a better answer is that when we do go implement TUF repo deletion, we leave the tuf_repo record around? That might be nice because it would allow us to keep other metadata about these old versions. Plus you wouldn't have to do anything to deal with this now 😀
If we do decide to do that, let's make a note in #7135 so that we don't forget.
Edit: maybe none of this matters because a user's never going to be able to ask for a row from this table that isn't the current one? Maybe if the older ones are only for debugging, it's not a big deal if this produces a 500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be reasonable to leave tuf_repo records around, maybe with a deleted flag (so noted on #7135). But I don't think denormalization is a good solution here; why even store the repo ID if you have the version?
I have thought about a list endpoint for fetching previous targets, but I figured we could take it as a follow-up.
| networking_switch_port_lldp_neighbors (get "/v1/system/hardware/rack-switch-port/{rack_id}/{switch_location}/{port}/lldp/neighbors") | ||
| networking_switch_port_lldp_config_view (get "/v1/system/hardware/switch-port/{port}/lldp/config") | ||
| networking_switch_port_status (get "/v1/system/hardware/switch-port/{port}/status") | ||
| target_release_get (get "/v1/system/update/target-release") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not cover these? I'd think it'd be relatively easy since they don't rely on any other resources existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, covered in ee5b389.
davepacheco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray!
| path = "/v1/system/update/target-release", | ||
| tags = ["hidden"], // "system/update" | ||
| }] | ||
| async fn target_release_set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is the only use of the verb "set" in the API; we typically use "update" and that seems applicable here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed in #7760.
| path = "/v1/system/update/target-release", | ||
| tags = ["hidden"], // "system/update" | ||
| }] | ||
| async fn target_release_get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise "view" seems to be the more commonly used verb here.
Fixes #7280. Provides external API endpoints and corresponding datastore methods to get/set the current
target_releaseconfig property. Releases are designated by their semantic version, and must be uploaded to the TUF repo depot prior to being set as the target release.This PR does not attempt to actually upgrade the rack to the current target release. The reconfigurator changes to plan & execute such an upgrade will be handled as a follow-up.