-
Couldn't load subscription status.
- Fork 576
[WIP] AGENT-1330: machineconfiguration/v1alpha1: add InternalReleaseImage #2510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[WIP] AGENT-1330: machineconfiguration/v1alpha1: add InternalReleaseImage #2510
Conversation
|
@andfasano: This pull request references AGENT-1330 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @andfasano! Some important instructions when contributing to openshift/api: |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9ea943a to
9eca225
Compare
0d80867 to
7ea433e
Compare
|
/retest-required |
|
I'm currently experimenting with AI for API review, hopefully some of the content it is generating is helpful for you to improve your API The following code blocks were generated by Claude I think the linting issues are actually not the response I'd like. Instead lets try and make the zero values not valid (required fields or For the comments, it has highlighted that you haven't explained what happens when the optional fields are omitted, though I'm not sure its suggestions are super helpful, please review and think about what you'd actually like to put in. If it has identified something where you can't think of a reason why it wouldn't be present, then maybe the field should actually be required Quizzing specifically about whether all validations were documented, some further output |
7b60a33 to
ab6545c
Compare
|
/api-review |
cf83224 to
c2f669c
Compare
|
/test verify |
f330e2a to
6b6941e
Compare
a9c080e to
5bb2da8
Compare
|
/retest |
5bb2da8 to
ee91dd4
Compare
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the structure makes more sense to me. A few questions/comments on the new MCN fields (and the previous question on the conditions)
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:MaxItems=5 | ||
| // +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud, maybe we should have this be required (if the IRI status exists) so at it would always reflect what the daemon is currently detecting? And if we don't detect anything we just have an empty list?
| // +listMapKey=name | ||
| // +kubebuilder:validation:MaxItems=5 | ||
| // +optional | ||
| AvailableReleases []InternalReleaseImageRef `json:"availableReleases,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have the split, what get's reflected here? If e.g. each control plane node doesn't currently have the same available (mounted) releases, would this reflect it as available if any control plane node is hosting that image? Or only when all control plane nodes have it mounted and reporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, the idea was to keep IRI as the main (centralized) user interface both for editing and monitoring (for the sake of simplicity). As soon as a new release will be detected on on of the node, and reported on its MCN status, it will be added as well in the IRI availableRelease field.
So, the IRI reports the info from a cluster-wide scope point of view, where the the MCN remains scoped only for the node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc a release will be added here if and only if all nodes have it properly stored. So, iiuc, the only possibilities for a realease to not be in all nodes are:
- The image is still being copied.
- There's a failure in the copy process. I guess we will add details in the MCN and here, isn't it? What would each CR report in its conditions?
| // +kubebuilder:validation:MinItems=1 | ||
| // +kubebuilder:validation:MaxItems=5 | ||
| // +optional | ||
| InstalledReleases []InternalReleaseImageDetailedRef `json:"installedReleases,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, as above, the IRI will report that a release is installed (at a cluster level) only when all the node successfully completed the installation
eb17df7 to
b91c71a
Compare
| // be used to amend the `spec.Releases` field to add a new release bundle to the cluster. | ||
| // An empty value indicates that no ISOs are currently being detected on any control plane | ||
| // node. | ||
| // Must not exceed 5 entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...sts/internalreleaseimages.machineconfiguration.openshift.io/NoRegistryClusterOperations.yaml
Show resolved
Hide resolved
69a28e6 to
67cbb0e
Compare
67cbb0e to
7727158
Compare
|
@andfasano: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions looks good, some more comments inline:
And also, running claude again via #2548 to see what it suggests:
Summary
✅ Linting checks PASSED - No issues found with make lint
Detailed Analysis
I've reviewed the PR against both Kubernetes and OpenShift API conventions. The changes introduce a new v1alpha1 API
(InternalReleaseImage) and add corresponding status fields to the existing v1 MachineConfigNode API.
Critical Issues Found
1. Missing Optional Field Behavior Documentation
Line 161-166 in machineconfiguration/v1/types_machineconfignode.go:
Current (problematic) code:
// internalReleaseImage describes the status of the release payloads stored in the node.
// When specified, an internalReleaseImage custom resource exists on the cluster, and the specified images will be made
available on the control plane nodes.
// This field will reflect the actual on-disk state of those release images.
// +openshift:enable:FeatureGate=NoRegistryClusterOperations
// +optional
InternalReleaseImage MachineConfigNodeStatusInternalReleaseImage `json:"internalReleaseImage,omitzero,omitempty"`
Suggested change:
- // internalReleaseImage describes the status of the release payloads stored in the node.
- // When specified, an internalReleaseImage custom resource exists on the cluster, and the specified images will be
made available on the control plane nodes.
- // This field will reflect the actual on-disk state of those release images.
+ // internalReleaseImage describes the status of the release payloads stored in the node.
+ // When specified, an internalReleaseImage custom resource exists on the cluster, and the specified images will be
made available on the control plane nodes.
+ // This field will reflect the actual on-disk state of those release images.
+ // When omitted, no internalReleaseImage custom resource exists on the cluster, or the NoRegistryClusterOperations
feature gate is disabled.
Explanation: According to OpenShift API conventions, optional fields must explicitly document what happens when they are
omitted. The current documentation only explains when the field is specified, but doesn't clarify the omitted behavior.
---
2. Missing Validation Marker Documentation (MinItems/MaxItems)
Line 42-52 in machineconfiguration/v1alpha1/types_internalreleaseimage.go:
Current (problematic) code:
// releases is a list of release bundle identifiers that the user wants to
// add/remove to/from the control plane nodes.
// Entries must be unique, keyed on the name field.
// This field can contain between 1 and 5 entries.
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=5
// +listType=map
// +listMapKey=name
// +required
Releases []InternalReleaseImageRef `json:"releases,omitempty"`
Suggested change:
// releases is a list of release bundle identifiers that the user wants to
// add/remove to/from the control plane nodes.
// Entries must be unique, keyed on the name field.
- // This field can contain between 1 and 5 entries.
+ // This field is required and must contain between 1 and 5 entries.
Explanation: While the documentation mentions the range constraint, it should also clarify that the field is required
(not just via the marker). However, this is borderline acceptable as the constraint is documented. The main improvement
is making it clearer that this is a required field in the prose.
---
3. Inconsistent Required Field Marking
Line 172-183 in machineconfiguration/v1/types_machineconfignode.go:
Current (problematic) code:
// MachineConfigNodeStatusInternalReleaseImage holds information about the current and discovered release bundles for
the observed machine
// config node.
type MachineConfigNodeStatusInternalReleaseImage struct {
// releases is a list of the release bundles currently owned and managed by the
// cluster, indicating that their images can be safely pulled by any cluster entity
// requiring them.
// Entries must be unique, keyed on the name field.
// This field can contain between 1 and 5 entries.
// +listType=map
// +listMapKey=name
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=5
// +required
Releases []MachineConfigNodeStatusInternalReleaseImageRef `json:"releases,omitempty"`
}
Suggested change:
// releases is a list of the release bundles currently owned and managed by the
// cluster, indicating that their images can be safely pulled by any cluster entity
// requiring them.
// Entries must be unique, keyed on the name field.
- // This field can contain between 1 and 5 entries.
+ // This field is required and must contain between 1 and 5 entries.
Explanation: The field is marked as +required but the documentation says "can contain" which is ambiguous. It should
explicitly state "is required and must contain" to match the validation marker.
---
4. Optional Field in Status Missing Omitted Behavior
Line 66-77 in machineconfiguration/v1alpha1/types_internalreleaseimage.go:
Current (problematic) code:
// releases is a list of the release bundles currently owned and managed by the
// cluster, indicating that their images can be safely pulled by any cluster entity
// requiring them.
// Entries must be unique, keyed on the name field.
// This field can contain between 1 and 5 entries.
// +listType=map
// +listMapKey=name
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=5
// +optional
Releases []InternalReleaseImageStatusRef `json:"releases,omitempty"`
Suggested change:
// releases is a list of the release bundles currently owned and managed by the
// cluster, indicating that their images can be safely pulled by any cluster entity
// requiring them.
// Entries must be unique, keyed on the name field.
- // This field can contain between 1 and 5 entries.
+ // When specified, this field must contain between 1 and 5 entries.
+ // When omitted, no release bundles have been observed or managed yet.
Explanation: This optional field doesn't explain what it means when it's omitted. According to the API review
requirements, optional fields must document their omitted behavior.
---
5. Spelling Error in Documentation
Line 116 and Line 224 in both files:
Current (problematic) code:
// InternalReleaseImageConditionTypeInstalling describes a new release that is getting installed on the current node.
Due the size of the data
// transfered, the operation could take several minutes
Suggested change:
- // InternalReleaseImageConditionTypeInstalling describes a new release that is getting installed on the current node.
Due the size of the data
- // transfered, the operation could take several minutes
+ // InternalReleaseImageConditionTypeInstalling describes a new release that is getting installed on the current node.
Due to the size of the data
+ // transferred, the operation could take several minutes
Explanation: Two errors: "Due the" should be "Due to the" and "transfered" should be "transferred" (two r's).
---
Positive Observations
✅ Strong validation coverage - The API includes comprehensive CEL validation rules for the image field format
✅ Good documentation structure - Field purposes are generally well-documented
✅ Proper feature gating - Both APIs are properly gated behind the NoRegistryClusterOperations feature gate
✅ Consistent naming - Field names follow Kubernetes/OpenShift conventions (camelCase in JSON, PascalCase in Go)
✅ List type annotations - Proper use of +listType=map with +listMapKey for uniqueness constraints
✅ Validation markers documented - MinLength/MaxLength constraints are documented in prose
---
Recommendations
MUST FIX:
1. Document omitted behavior for all optional fields (Issues #1 and #4)
2. Fix spelling errors in condition type documentation (Issue #5)
SHOULD FIX:
3. Clarify required field documentation to explicitly state "required" in prose (Issues #2 and #3)
---
Conclusion
The PR demonstrates good API design practices but has critical documentation gaps around optional field behavior that
must be addressed before merging. These gaps violate the OpenShift API convention requirement that optional fields must
explain what happens when omitted.
Once the optional field documentation is updated, this API will be compliant with both Kubernetes and OpenShift API
conventions.
Also ran the same review command from the master branch, it had the same suggestions, but a couple more things:
API Review Report for PR #2510
...
Issue 7: Duplicate InternalReleaseImageConditionType in wrong package
Line 217: The InternalReleaseImageConditionType enum is defined in the v1 package but the comment references
MachineConfigNodeStatusInternalReleaseImageRef, suggesting this might be used across both v1 and v1alpha1.
Current (problematic) code:
// InternalReleaseImageConditionType is each possible state for each possible
MachineConfigNodeStatusInternalReleaseImageRef
// conditions type.
// +enum
type InternalReleaseImageConditionType string
Status: ⚠️ DESIGN CONCERN - This same enum type is defined in both machineconfiguration/v1/types_machineconfignode.go
and machineconfiguration/v1alpha1/types_internalreleaseimage.go. While not strictly a documentation issue, this creates
type duplication between API versions. Consider if one should reference the other or if they should be in a shared
package.
---
● 📊 SUMMARY
Critical Issues (Must Fix): 2
1. machineconfiguration/v1alpha1/types_internalreleaseimage.go:47 - Releases field: Missing clear statement that field
is required (should say "must contain" not "can contain")
2. machineconfiguration/v1alpha1/types_internalreleaseimage.go:87 - Conditions field: Missing omitted behavior
documentation and MinItems/MaxItems documentation
Medium Issues (Should Fix): 2
3. machineconfiguration/v1/types_machineconfignode.go:166 - InternalReleaseImage field: Missing omitted behavior
documentation
4. machineconfiguration/v1/types_machineconfignode.go:182 - Releases field: Inconsistent wording ("can contain" vs "must
contain")
Design Concerns: 1
5. Type Duplication: InternalReleaseImageConditionType is defined identically in both v1 and v1alpha1 packages
---
✅ What's Done Well
1. Comprehensive validation rules: The XValidation rules for image references are well-designed and thoroughly enforce
the OCI image format
2. Length constraints documented: MinLength and MaxLength constraints are clearly documented for most fields
3. Field purpose clear: Most fields have clear descriptions of their purpose and use
4. Feature gate usage: Proper use of feature gates for tech preview functionality
---
🔧 Recommended Actions
Before Merge:
1. Fix the two critical documentation issues for optional fields and required fields
2. Address the medium issues for consistency
3. Consider refactoring the duplicated InternalReleaseImageConditionType to a shared location
Post-Review:
- Run make update-codegen-crds
API_GROUP_VERSIONS=machineconfiguration.openshift.io/v1,machineconfiguration.openshift.io/v1alpha1 after making
documentation changes
- Re-run make lint to ensure no new issues are introduced
---
So it somewhat caught that the crd isn't being deployed, also comment on that inline
| api-approved.openshift.io: https://github.com/openshift/api/pull/2255 | ||
| api.openshift.io/merged-by-featuregates: "true" | ||
| include.release.openshift.io/self-managed-high-availability: "true" | ||
| release.openshift.io/feature-set: TechPreviewNoUpgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think all of these got generated since we add the FG'ed field, so that's fine. I think we're missing the IRI CRD though. You probably should update:
https://github.com/openshift/api/blob/master/hack/update-payload-crds.sh#L26
And add in the machineconfiguration/v1alpha1/zz_generated.crd-manifests/0000_80_machine-config_01_internalreleaseimages-*.crd.yaml so they show up in the deployed manifests after you do a make.
| // +kubebuilder:validation:MaxItems=5 | ||
| // +listType=map | ||
| // +listMapKey=name | ||
| // +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is required, and the spec is also required, this implies that when the user creates a IRI object, they must also have a non-empty list of releases they'd like in the spec, which I feel like is counter-intuitive since if I remember correctly we expect the status to be populated and then for the user to specify what image they want?
Taking a further step back, do we expect the cluster to always have a IRI object to exist? Should the controller create it if it detects any MCN with a IRI mounted or available? If it is responsible for creating one, should we make this optional so the user can add it after the fact, but also force a singleton cluster for the IRI object so we don't have multiple IRI objects on the cluster (which wouldn't make sense), similar to https://github.com/openshift/api/blob/master/operator/v1/types_olm.go#L22 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assumption is that the OVE installer will take care of creating the IRI resource, with the right spec.releases entry value (as an extra manifest) (the presence of the IRI resource will be used also as a trigger during the MCO bootstrap command). It is not expected that the user will create it - whereas it is expected for the user to add an entry in future for managing an upgrade.
In the longer term, this means that at the end of the installation phase, when the IRI controller/MCD will become active, they will simply update their status with a new installed release (already found). Thus having an IRI resource with an empty spec.releases didn't sound properly right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bootstrap workflow sounds fine, but in regular cluster operation, is there a need to keep it around post-install or post-upgrade? i.e. would there be a case where the user finishes the installation then wants to remove the IRI entry such that it's not always actively serving, or the user wants to remove the current one before adding the new one for upgrade?
If we want to explicitly prevent this, then I think it's ok to have it as a required spec, just wanted to think through the options.
Also on the second point, since we only expect the user to update the existing object and not create new ones, maybe we should make it a singleton so the user doesn't try to set multiple objects.
|
|
||
| const ( | ||
| // InternalReleaseImageConditionTypeMounted describes a new release, not yet installed, that has been discovered when an ISO has been attached to | ||
| // the current node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is not on the current node, but rather an aggregated status right?
Also applies to the below statuses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the one expected to be monitored by the user to understand how globally a given task is progressing. For example, at IRI level a release will become Available only when all the control plane nodes will have it as Available in their MCN iri status conditions field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, could you update the godoc here to represent that?
|
|
||
| // MachineConfigNodeStatusInternalReleaseImageRef is used to provide a more detailed reference for | ||
| // a release bundle. | ||
| // +openshift:enable:FeatureGate=NoRegistryClusterOperations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on Bryce's early comment, I this is not needed since we have it on the MachineConfigNodeStatus field already.
(and also MachineConfigNodeStatusInternalReleaseImage doesn't have this so it would be inconsistent to have it here)
| // a release bundle. | ||
| // +openshift:enable:FeatureGate=NoRegistryClusterOperations | ||
| type MachineConfigNodeStatusInternalReleaseImageRef struct { | ||
| // conditions represent the observations of an internal release image current state. See InternalReleaseImageConditionType for the possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's easy to find that via godocs, maybe we should be documenting the exact types like https://github.com/openshift/api/blob/master/machineconfiguration/v1/types_machineconfignode.go#L113-L116 ?
…re adopted for the MCN status field
7727158 to
e4169d2
Compare
This patch adds the new
InternalReleaseImageCRD. See openshift/enhancements#1821 for additional details