-
Notifications
You must be signed in to change notification settings - Fork 458
MCO-1807: Add CPMS support in the MCO's boot image controller #5332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
@djoshy: This pull request references MCO-1807 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test all |
|
@djoshy: This pull request references MCO-1807 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@djoshy: This pull request references MCO-1807 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
502f475 to
a52885f
Compare
|
/test verify |
4aa7aca to
42d5df2
Compare
|
@djoshy: This pull request references MCO-1807 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Opening this up for initial review; I've integrated the API from openshift/api#2396. |
pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go
Show resolved
Hide resolved
pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go
Show resolved
Hide resolved
|
@djoshy: This pull request references MCO-1807 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
0caa774 to
09b1528
Compare
|
Rebased to fix conflicts |
|
Pre-merge tested:
AWS: GCP: Azure
AWS GCP Azure $ oc logs machine-config-controller-68b8f55d59-mk4b8 | tail -n 10 I1018 14:58:52.748797 1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources I1018 14:58:52.783675 1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on Azure, with arch x86_64 I1018 14:58:52.784546 1 platform_helpers.go:324] Current boot image version: 419.6.20250523 I1018 14:58:52.784565 1 platform_helpers.go:325] New target boot image version: 419.6.20250523 I1018 14:58:52.789604 1 cpms_helpers.go:197] Patching ControlPlaneMachineSet cluster I1018 14:58:52.859691 1 cpms_helpers.go:250] Successfully patched ControlPlaneMachineSet cluster I1018 14:58:52.860160 1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources I1018 14:58:52.923635 1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on Azure, with arch x86_64 I1018 14:58:52.924500 1 cpms_helpers.go:200] No patching required for ControlPlaneMachineSet cluster
$ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master NAME PHASE TYPE REGION ZONE AGE ppt-18-a-s576k-master-0 Running n2-standard-4 us-central1 us-central1-a 7h23m ppt-18-a-s576k-master-1 Running n2-standard-4 us-central1 us-central1-b 7h23m ppt-18-a-s576k-master-2 Running n2-standard-4 us-central1 us-central1-c 7h23m $ oc delete machine ppt-18-a-s576k-master-0 -n openshift-machine-api machine.machine.openshift.io "ppt-18-a-s576k-master-0" deleted
$ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master NAME PHASE TYPE REGION ZONE AGE ppt-18-a-s576k-master-0 Deleting n2-standard-4 us-central1 us-central1-a 8h ppt-18-a-s576k-master-1 Running n2-standard-4 us-central1 us-central1-b 8h ppt-18-a-s576k-master-2 Running n2-standard-4 us-central1 us-central1-c 8h ppt-18-a-s576k-master-9wrzh-0 Provisioned n2-standard-4 us-central1 us-central1-a 25s $ oc get nodes -l node-role.kubernetes.io/master NAME STATUS ROLES AGE VERSION ppt-18-a-s576k-master-1.us-central1-b.c.openshift-qe.internal Ready control-plane,master 11h v1.33.5 ppt-18-a-s576k-master-2.us-central1-c.c.openshift-qe.internal Ready control-plane,master 11h v1.33.5 ppt-18-a-s576k-master-9wrzh-0 Ready control-plane,master 3h36m v1.33.5
$ oc get machineconfigurations -o yaml
...
spec:
logLevel: Normal
managedBootImages:
machineManagers:
- apiGroup: machine.openshift.io
resource: controlplanemachinesets
selection:
mode: None
status:
managedBootImagesStatus:
machineManagers:
- apiGroup: machine.openshift.io
resource: controlplanemachinesets
selection:
mode: None
AWS: GCP: Azure I tired to patch the |
|
/verified by @ptalgulk01 |
|
@djoshy: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
isabella-janssen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
All review comments were addressed & QE verified, so this looks great to me. Thanks @djoshy 🎉
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, isabella-janssen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test all |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
06e7b70
into
openshift:main
- What I did
This PR adds support for boot image updates to
ControlPlaneMachineSetfor the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:cluster. The boot images are stored under spec, in a field similar toMachineSets. For example, in AWS(abbreviated to only important fields):RollingUpdate,RecreateorOnDelete. InRollingUpdatemode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.- How to verify it
TechPreviewfeatureset.clusterfor comparison purposes.MachineConfigurationobject:ami-00abe7f9c6bd85a77.projects/rhcos-cloud/global/images/, for exampleprojects/rhcos-cloud/global/images/test.MachineConfigurationobject's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in thecoreos-bootimagesconfigmap. Themachine-config-controllerlogs should also mention that a boot image update took place.spec.replicasvalue, and it should be able to do so successfully. This process might take a while(took about 10-15 minutes on GCP for me) to complete as the CPMS controller will first scale up the replacement and then drain and delete the older control plane machine. I think this is to maintain etcd quorum at all points of the process.MachineConfigurationobject's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in thecoreos-bootimagesconfigmap. Themachine-config-controllerlogs should also mention that a boot image update did not take place.Note: Since these are singleton objects, the
Partialselection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to setPartialfor CPMS objects, so I suppose that is something to test as well! 😄