Skip to content

Conversation

@plotnick
Copy link
Contributor

Fixes #7280. Provides external API endpoints and corresponding datastore methods to get/set the current target_release config property. Releases are designated by their semantic version, and must be uploaded to the TUF repo depot prior to being set as the target release.

This PR does not attempt to actually upgrade the rack to the current target release. The reconfigurator changes to plan & execute such an upgrade will be handled as a follow-up.

polar_snippet = FleetChild,
}

authz_resource! {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm defining this authz resource, but am not convinced that I'm using it right (or at all). Do these only apply to resources that use the lookup* macros? Any guidance here would be appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the target blueprint as an analog here because it's similarly global.

It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.

That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, replaced with a synthetic resource in 5fe4bb3.


/// Use the specified release of the rack's system software.
SystemVersion(SemverVersion),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into nexus/types/src/external_api/shared.rs since you're using it in both views and params. Silly, but that's the scheme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, moved in fe23073.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and moved back again in 87b2000, since we no longer take a view as a param.

method = GET,
path = "/v1/system/update/target-release",
tags = ["system/update"],
unpublished = true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind writing a comment as to why these two are unpublished? If there's a related GH issue, would you include it in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no good reason, really; I just copied the API parameters from the other /v1/system/update endpoints (TUF repo depot). I can certainly mark these as published if we're happy with them; @iliana should we also publish the repository endpoints?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to publish the updates endpoints right now, they don't work unless you have configured Nexus in a special way. I don't know what our general policy is on endpoints that customers shouldn't/can't poke at yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a hidden tag. I use it to skip those endpoints when generating the Go SDK (not sure about the rust or typescript one), but it would still be available in the API. Perhaps we should leave these unpublished and leave a comment explaining why.

Thoughts? @david-crespo @ahl

Copy link
Contributor

@david-crespo david-crespo Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to merge these PRs but these endpoints definitely won't do anything for customers in the next release, I would lean toward unpublished. If we want our own clients (most likely the CLI) to be able to use them, we need them in the OpenAPI schema but can use hidden to keep them out of the docs. Sounds like the Go SDK leaves out hidden endpoints but I don't think that's true of all clients (e.g., the TS client generator does produce methods for hidden endpoints).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docs for hidden:

"hidden" = {
                description = "TODO operations that will not ship to customers",
                external_docs = {
                    url = "http://docs.oxide.computer/api"
                }
            },

Seems like this initial definition has shifted significantly.

This area could probably use some thought / love beyond the scope of this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is (being unpublished), we still have the problems that:

  • the coverage test doesn't test this (I guess not a blocker exactly but not great -- maybe we should file a blocker issue to fix it)
  • the CLI won't be generated for it, which means it will be hard to use this even in development, right?

From what @david-crespo said, using hidden here will fix both of these. I can't tell if this is consistent with the description @ahl pasted or not. This is sort of a TODO operation, it's not currently intended to be shipped to customers, but it eventually will be.

I guess there could be some fear that a customer using the CLI will actually set this? (A customer using curl could accidentally set it as-is, though.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried:

-        tags = ["system/update"],
-        unpublished = true,
+        tags = ["hidden"], // "system/update"

on all the system/update endpoints, but got a failure I don't understand in check::tests::check_apis_up_to_date:

     Failure [ 9/14] nexus.json (Oxide Region API v20250212.0.0)                     OpenAPI document validation failed
                     -> Rust documentation found in external interface: An identifier for an artifact.

                        The kind is [`ArtifactKind`], indicating that it might represent an artifact whose kind is unknown.
                        For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

                        Problem with type TargetRelease: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
                        this schema's type
                        String(
                            StringType {
                                format: Empty,
                                pattern: None,
                                enumeration: [
                                    Some(
                                        "unspecified",
                                    ),
                                ],
                                min_length: None,
                                max_length: None,
                            },
                        )
                        differs from this
                        Object(
                            ObjectType {
                                properties: {
                                    "system_version": Item(
                                        Schema {
                                            schema_data: SchemaData {
                                                nullable: false,
                                                read_only: false,
                                                write_only: false,
                                                deprecated: false,
                                                external_docs: None,
                                                example: None,
                                                title: None,
                                                description: None,
                                                discriminator: None,
                                                default: None,
                                                extensions: {},
                                            },
                                            schema_kind: Type(
                                                String(
                                                    StringType {
                                                        format: Empty,
                                                        pattern: Some(
                                                            "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
                                                        ),
                                                        enumeration: [],
                                                        min_length: None,
                                                        max_length: None,
                                                    },
                                                ),
                                            ),
                                        },
                                    ),
                                },
                                required: [
                                    "system_version",
                                ],
                                additional_properties: Some(
                                    Any(
                                        false,
                                    ),
                                ),
                                min_properties: None,
                                max_properties: None,
                            },
                        )

                        For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch

                        Problem with type TargetReleaseSource: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
                        this schema's type
                        String(
                            StringType {
                                format: Empty,
                                pattern: None,
                                enumeration: [
                                    Some(
                                        "unspecified",
                                    ),
                                ],
                                min_length: None,
                                max_length: None,
                            },
                        )
                        differs from this
                        Object(
                            ObjectType {
                                properties: {
                                    "system_version": Item(
                                        Schema {
                                            schema_data: SchemaData {
                                                nullable: false,
                                                read_only: false,
                                                write_only: false,
                                                deprecated: false,
                                                external_docs: None,
                                                example: None,
                                                title: None,
                                                description: None,
                                                discriminator: None,
                                                default: None,
                                                extensions: {},
                                            },
                                            schema_kind: Type(
                                                String(
                                                    StringType {
                                                        format: Empty,
                                                        pattern: Some(
                                                            "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
                                                        ),
                                                        enumeration: [],
                                                        min_length: None,
                                                        max_length: None,
                                                    },
                                                ),
                                            ),
                                        },
                                    ),
                                },
                                required: [
                                    "system_version",
                                ],
                                additional_properties: Some(
                                    Any(
                                        false,
                                    ),
                                ),
                                min_properties: None,
                                max_properties: None,
                            },
                        )

                        For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch

                        Rust documentation found in external interface: Metadata about an individual TUF artifact.

                        Found within a [`TufRepoDescription`].
                        For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

                        Rust documentation found in external interface: Status of a TUF repo import.

                        Part of [`TufRepoInsertResponse`].
                        For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

                        Rust documentation found in external interface: Metadata about a TUF repository.

                        Found within a [`TufRepoDescription`].
                        For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tried:

-        tags = ["system/update"],
-        unpublished = true,
+        tags = ["hidden"], // "system/update"

on all the system/update endpoints, but got a failure I don't understand in check::tests::check_apis_up_to_date:

These are errors from our openapi-lint -- self-imposed conventions on the API.

     Failure [ 9/14] nexus.json (Oxide Region API v20250212.0.0)                     OpenAPI document validation failed
                     -> Rust documentation found in external interface: An identifier for an artifact.

                        The kind is [`ArtifactKind`], indicating that it might represent an artifact whose kind is unknown.
                        For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

This one is saying that the Rustdoc for some type (that's apparently newly part of the external API with this change) appears to look like Rust-specific documentation. But we use this documentation for external users, so that'd be confusing for them. The link says:

As such, it's easy to accidentally allow internally-relevant documentation leak out as externally-visible in the OpenAPI document. It's not possible to simply infer this from text alone, but we do look for shibboleths such as a Rust path delimeter (::) and bracketed expressions with no subsequent parentheses (title being reasonable).

So I'm guessing it's the link to ArtifactKind that doesn't have a URL in it. Rustdoc does the right thing, but the API doc generator wouldn't.

                    Problem with type TargetRelease: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
                    this schema's type
                    String(
                        StringType {
                            format: Empty,
                            pattern: None,
                            enumeration: [
                                Some(
                                    "unspecified",
                                ),
                            ],
                            min_length: None,
                            max_length: None,
                        },
                    )
                    differs from this
                    Object(
                        ObjectType {
                            properties: {
                                "system_version": Item(
                                    Schema {
                                        schema_data: SchemaData {
                                            nullable: false,
                                            read_only: false,
                                            write_only: false,
                                            deprecated: false,
                                            external_docs: None,
                                            example: None,
                                            title: None,
                                            description: None,
                                            discriminator: None,
                                            default: None,
                                            extensions: {},
                                        },
                                        schema_kind: Type(
                                            String(
                                                StringType {
                                                    format: Empty,
                                                    pattern: Some(
                                                        "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
                                                    ),
                                                    enumeration: [],
                                                    min_length: None,
                                                    max_length: None,
                                                },
                                            ),
                                        ),
                                    },
                                ),
                            },
                            required: [
                                "system_version",
                            ],
                            additional_properties: Some(
                                Any(
                                    false,
                                ),
                            ),
                            min_properties: None,
                            max_properties: None,
                        },
                    )

                    For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch

This one is basically saying that our enums are supposed to be adjacently-tagged. There's more detail in that link. I think you need to use #[serde(tag...)] on TargetReleaseSource.

                    Problem with type TargetReleaseSource: Mismatched types between subschemas; this is often due to enums with different data payloads and can be resolved using serde adjacent tagging.
                    this schema's type
                    String(
                        StringType {
                            format: Empty,
                            pattern: None,
                            enumeration: [
                                Some(
                                    "unspecified",
                                ),
                            ],
                            min_length: None,
                            max_length: None,
                        },
                    )
                    differs from this
                    Object(
                        ObjectType {
                            properties: {
                                "system_version": Item(
                                    Schema {
                                        schema_data: SchemaData {
                                            nullable: false,
                                            read_only: false,
                                            write_only: false,
                                            deprecated: false,
                                            external_docs: None,
                                            example: None,
                                            title: None,
                                            description: None,
                                            discriminator: None,
                                            default: None,
                                            extensions: {},
                                        },
                                        schema_kind: Type(
                                            String(
                                                StringType {
                                                    format: Empty,
                                                    pattern: Some(
                                                        "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:-((?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+([0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$",
                                                    ),
                                                    enumeration: [],
                                                    min_length: None,
                                                    max_length: None,
                                                },
                                            ),
                                        ),
                                    },
                                ),
                            },
                            required: [
                                "system_version",
                            ],
                            additional_properties: Some(
                                Any(
                                    false,
                                ),
                            ),
                            min_properties: None,
                            max_properties: None,
                        },
                    )

                    For more info, see https://github.com/oxidecomputer/openapi-lint#type-mismatch

I think this is the same issue being reported directly on TargetReleaseSource. Before, it was being reported on TargetRelease because it contains a TargetReleaseSource.

                    Rust documentation found in external interface: Metadata about an individual TUF artifact.

                    Found within a [`TufRepoDescription`].
                    For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

This is the same as the first one: I infer that you can't use the [Identifier] syntax in Rustdoc that's being used as API documentation.

                    Rust documentation found in external interface: Status of a TUF repo import.

                    Part of [`TufRepoInsertResponse`].
                    For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

Same.

                    Rust documentation found in external interface: Metadata about a TUF repository.

                    Found within a [`TufRepoDescription`].
                    For more info, see https://github.com/oxidecomputer/openapi-lint#rust-documentation

Same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for decoding that for me. Published with the hidden tag and required doc/representation changes in 25a88b0.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking good and it's going to be key structure for next steps.

My biggest questions here are the two in nexus/types/src/external_api/shared.rs.

'system_version'
);

-- The software release that should be deployed to the rack.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this structured like bp_target, where it's a list of all previous configurations, and only the one with the latest generation number matters? If so I think it'd be useful to document that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, documented in 80f690.

generation INT8 NOT NULL PRIMARY KEY,
time_requested TIMESTAMPTZ NOT NULL,
release_source omicron.public.target_release_source NOT NULL,
system_version STRING(64), -- "foreign key" into the `tuf_repo` table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary key of tuf_repo is a uuid. Should this be that? Or is it supposed to match the sha256 field of tuf_repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, as discussed 80f69029c immediately resolves the version to a TUF repo UUID.

release_source,
system_version
) VALUES (
0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitty but I think I'd use 1 here just for consistency with other places we've used similar numbers (like OmicronZonesConfig generation and bp_target version).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, wasn't sure what the right starting generation was. Fixed in 80f6902.

InstallDataset,

/// Use the specified release of the rack's system software.
SystemVersion(SemverVersion),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the external API, I'd suggest we use either a TUF repo id or its SHA256 rather than a semver. I get that the human wants to know something more like the semver (although even then, I think the descriptor for them is an opaque token -- it could as well be "2025 Q1"). But I think it will be simpler and clearer to just pick one of the TUF repo's identifiers here so that it's very obvious what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed 80f6902, immediately resolves the version to a TUF repo UUID. The user specifies the version, which must have a corresponding TUF repo already loaded.

}
}

/// The source of the target release.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the target release for the whole system, right? From an operator's perspective, I think the InstallDataset variant would be better called "LastMupdate".

I wasn't thinking about providing this as an option. I can see we need a value to represent "it hasn't been set yet" and we might want to let people reset it to that. But what are the actual semantics of setting it to this value? Let's say the value was set to SystemVersion and Reconfigurator has updated a few zones and now somebody comes and sets this to InstallDataset/LastMupdate. Does Reconfigurator actually undo the changes it made? I'd be inclined to say no, if you want that, you need to go mupdate again. In that case though this isn't really setting the target release to "last mupdate" or "a specific version", it's setting or unsetting a target release. Maybe this should just be an Option<SystemVersion> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

87b2000 makes the version mandatory, and no longer allows setting the (renamed) Unspecified source variant.

method = GET,
path = "/v1/system/update/target-release",
tags = ["system/update"],
unpublished = true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this being unpublished is also preventing the coverage test from finding it, which seems bad.

/// rack should eventually correspond to the release described here.
#[endpoint {
method = GET,
path = "/v1/system/update/target-release",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this /v1/system/target-release and calling the operation target_release_get (or view)?

Copy link
Contributor Author

@plotnick plotnick Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, 80f6902 renames it to Unspecified, and 87b2000 makes the version mandatory, i.e., you can no longer set the target release to Unspecified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about dropping the update in the URL path? I'm not sure why that feels not quite right to me. I don't feel strongly about it.

Comment on lines 70 to 74
self.update_tuf_repo_get(opctx, system_version)
.await
.map_err(|e| err.bail(e))?
.repo
.system_version,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to panic here, right?

If we were doing this in raw SQL I might do:

INSERT INTO target_release (id, ...) VALUES (SELECT id FROM tuf_repo WHERE system_version = ...), ...)

This would avoid an interactive transaction and reduce contention a lot but I guess it's fairly painful to do in Diesel. We could also fetch the id outside of the transaction but I suppose we still have to check that it's still there so it doesn't help much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this resolution to the app layer in 80f6902.

Oximeter,
MetricProducer,
RoleBuiltin,
TargetRelease,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this get used? Is it that the authz_resource! macro expected it?

I wouldn't really expect this to be needed because I think it's mainly used for generating 404s and you can't get a 404 on a target release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed and replaced with a synthetic resource in 5fe4bb3.

polar_snippet = FleetChild,
}

authz_resource! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the target blueprint as an analog here because it's similarly global.

It looks like we don't use authz_resource! for this. Instead, we have a synthetic resource called BlueprintConfig. It has its own type here and a singleton instance on which we manually implement PolarClass and AuthorizedResource. Then there's a snippet in omicron.polar about it.

That kind of seems like the way to go here. TargetRelease isn't a type of "resource" in the same way that I think the authz_resource! macro means it (i.e., something you could CRUD, that has an id, for which instances might be missing, etc.)

@plotnick plotnick force-pushed the config-target-release branch from 17fd077 to fd908da Compare February 26, 2025 20:21
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 47 to 48
release_source: TargetReleaseSource,
tuf_repo_id: Option<DbTypedUuid<TufRepoKind>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me that the db model type has separate fields for the enum (unspecified or system_version) and the TUF repo id. Invalid combinations are possible but that's also possible in the database representation (but shouldn't happen because of the constraint).

But I would expect that above this (at the app layer) we'd have a more type-safe representation, like an enum with two variants where the TUF repo id is data associated with the SystemVersion variant. Then we'd accept that here.

As an example, I'm looking at InvOmicronZone in nexus/db-model/src/inventory.rs. That type also has a bunch of different Option values. There are lots of constraints on exactly how they can be set based on what kind of zone this represents. But the constructor InvOmicronZone::new() accepts the more type-safe thing (OmicronZoneConfig). There's also a fallible into_omicron_zone_config() to go back to the more type-safe version that we use elsewhere in the system. We could have something like that here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The app layer only sets a system version now, so 5898757 replaces these with type-safe constructors.

Comment on lines 32 to 44
// Fetch the row in the `target_release` table with the largest
// generation number. The subquery accesses the same table, so we
// have to make an alias to not confuse diesel.
let target_release_2 =
diesel::alias!(target_release as target_release_2);
dsl::target_release
.filter(dsl::generation.nullable().eq_any(target_release_2.select(
diesel::dsl::max(target_release_2.field(dsl::generation)),
)))
.select(TargetRelease::as_select())
.first_async(&*conn)
.await
.map_err(|e| public_error_from_diesel(e, ErrorHandler::Server))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much this matters but I would think you could do something like we do in blueprint_current_target_only():

dsl::target_release
    .order_by(dsl::generation.desc())
    .first_async()

This would avoid an extra subquery (and the need to use an alias).


Separately: what does this do when the table is empty? (What do we want it to do? I see that blueprint_current_target_only() explicitly allows this to be optional but then converts None to a specific error. I'm not sure if we want to make this a similar sort of 500 or return a (fake?) initial target release of "unspecified".)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's much simpler without the subquery! Both issues should be addressed in e7181db.

Comment on lines 59 to 60
self.transaction_retry_wrapper("set_target_release")
.transaction(&conn, |conn| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a single statement, I think it's better here to skip the explicit transaction.

If you skip self.transaction_retry_wrapper(...).transaction() altogether, then we'll just send this one statement to the database. That's still treated as a transaction. It's pretty unlikely to conflict with anything, but if it does, the database can retry it automatically. This is generally better than a client-side retry because the latency is much smaller and the window for another conflict is much shorter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, also done in e7181db.

opctx: &OpContext,
target_release: &TargetRelease,
) -> LookupResult<views::TargetRelease> {
opctx
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this makes any difference but I figured I'd mention it: I think in other places (like InvOmicronZone I mentioned elsewhere), we've used a pattern where:

  • you'd have a TargetRelease::into_external(self, system_version: SystemVersion) -> Result<views::TargetRelease, ?>
  • this function would fetch the system version like it does, but then use target_release.into_external(system_version)

Like I said, I'm not sure it makes any difference. I kind of like that the conversion to/from the external type could both live with the db type rather than in this logic here but 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, refactored in 27b7e8d.

Comment on lines 111 to 113
return Err(Error::invalid_request(
"missing TUF repo ID for specified system version",
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, is this likely a case where the system version is currently set to a TUF repo that we don't actually have? If that's the case, I feel like this should be a 500 (server error), not a 400 -- we shouldn't allow this to happen. (It's not this PR's job to prevent it, to be clear.)

But if this could be called with system version's that aren't the current one, where we might reasonably have deleted the old TUF repo, then maybe it needs to return an error type that would allow the caller to decide whether it's a 500 or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in 8e33014.

/// rack should eventually correspond to the release described here.
#[endpoint {
method = GET,
path = "/v1/system/update/target-release",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about dropping the update in the URL path? I'm not sure why that feels not quite right to me. I don't feel strongly about it.

/// a goal state for the rack's software, and attempt to asynchronously
/// update to that release.
#[endpoint {
method = POST,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not PUT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only my lack of understanding the difference, fixed in a709786.

release_source omicron.public.target_release_source NOT NULL,
tuf_repo_id UUID, -- "foreign key" into the `tuf_repo` table
CONSTRAINT tuf_repo_for_system_version CHECK (
(release_source != 'system_version') OR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want another side of this constraint for saying that tuf_repo_id IS NULL if release_source != system_version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further constrained in 1a83b85.

method = GET,
path = "/v1/system/update/target-release",
tags = ["system/update"],
unpublished = true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is (being unpublished), we still have the problems that:

  • the coverage test doesn't test this (I guess not a blocker exactly but not great -- maybe we should file a blocker issue to fix it)
  • the CLI won't be generated for it, which means it will be hard to use this even in development, right?

From what @david-crespo said, using hidden here will fix both of these. I can't tell if this is consistent with the description @ahl pasted or not. This is sort of a TODO operation, it's not currently intended to be shipped to customers, but it eventually will be.

I guess there could be some fear that a customer using the CLI will actually set this? (A customer using curl could accidentally set it as-is, though.)

Comment on lines 145 to 146
// with some (very small) fuzz allowed in the timestamp reported
// by the database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry I just noticed this.) As I think you found, the problem here is that the database only stores microsecond precision but our in-memory timestamps have nanosecond precision and these get silently truncated by a round-trip through the database. Could you update this comment to be more specific that we're dealing with a database truncation issue? I'm afraid a reader might think the "fuzz" here has to do with timing and in that case this test would likely be flaky.

(In some other places, we deal with this by creating the timestamp without nanoseconds part:
https://github.com/oxidecomputer/omicron/blob/config-target-release/nexus/inventory/src/builder.rs#L571-L582
but what you've got here works too!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Np, clarified in 09fd2f3.

TargetReleaseSource::SystemVersion
);
assert_eq!(target_release.system_version, Some(version));
assert_eq!(target_release.tuf_repo_id, Some(tuf_repo_id));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a test that if you try to insert a row with the same generation as an existing one, it fails? (Sorry if I missed that here.)

I'm thinking maybe take one that's in the database, and do target_release.new_system_version() or target_release.new_unspecified() twice (from the same initial target_release) and try to insert both and make sure one fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that test but removed it when I switched to the new constructors; I didn't think of making two! Restored in 09fd2f3.

use crate::db::schema::tuf_repo;
if let Some(tuf_repo_id) = target_release.tuf_repo_id {
views::TargetReleaseSource::SystemVersion {
version: tuf_repo::table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm realizing as I read this that it's conceivable that a target release doesn't have a corresponding TUF repo any more because it's been deleted. If we don't allow that, then we'd never be able to delete a TUF repo that was ever made the target release.

One option would be to denormalize this so that the version is also directly in the target_release table.

Maybe a better answer is that when we do go implement TUF repo deletion, we leave the tuf_repo record around? That might be nice because it would allow us to keep other metadata about these old versions. Plus you wouldn't have to do anything to deal with this now 😀

If we do decide to do that, let's make a note in #7135 so that we don't forget.

Edit: maybe none of this matters because a user's never going to be able to ask for a row from this table that isn't the current one? Maybe if the older ones are only for debugging, it's not a big deal if this produces a 500?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be reasonable to leave tuf_repo records around, maybe with a deleted flag (so noted on #7135). But I don't think denormalization is a good solution here; why even store the repo ID if you have the version?

I have thought about a list endpoint for fetching previous targets, but I figured we could take it as a follow-up.

networking_switch_port_lldp_neighbors (get "/v1/system/hardware/rack-switch-port/{rack_id}/{switch_location}/{port}/lldp/neighbors")
networking_switch_port_lldp_config_view (get "/v1/system/hardware/switch-port/{port}/lldp/config")
networking_switch_port_status (get "/v1/system/hardware/switch-port/{port}/status")
target_release_get (get "/v1/system/update/target-release")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cover these? I'd think it'd be relatively easy since they don't rely on any other resources existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, covered in ee5b389.

@plotnick plotnick mentioned this pull request Mar 6, 2025
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray!

@plotnick plotnick merged commit a5109e3 into main Mar 6, 2025
18 checks passed
@plotnick plotnick deleted the config-target-release branch March 6, 2025 22:10
path = "/v1/system/update/target-release",
tags = ["hidden"], // "system/update"
}]
async fn target_release_set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the only use of the verb "set" in the API; we typically use "update" and that seems applicable here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in #7760.

path = "/v1/system/update/target-release",
tags = ["hidden"], // "system/update"
}]
async fn target_release_get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise "view" seems to be the more commonly used verb here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

config property for target system version

9 participants