Skip to content

Conversation

@djoshy
Copy link
Contributor

@djoshy djoshy commented Oct 3, 2025

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        providerSpec:
          value:
            ami:
              id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
  logLevel: Normal
  operatorLogLevel: Normal
  managedBootImages:
    machineManagers:
      - resource: controlplanemachinesets
        apiGroup: machine.openshift.io
        selection:
          mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully. This process might take a while(took about 10-15 minutes on GCP for me) to complete as the CPMS controller will first scale up the replacement and then drain and delete the older control plane machine. I think this is to maintain etcd quorum at all points of the process.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
  name: cluster
  namespace: openshift-machine-config-operator
spec:
  logLevel: Normal
  operatorLogLevel: Normal
  managedBootImages:
    machineManagers:
      - resource: controlplanemachinesets
        apiGroup: machine.openshift.io
        selection:
          mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 3, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 3, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2025
@djoshy
Copy link
Contributor Author

djoshy commented Oct 3, 2025

/test all

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 6, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would be no-op. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 6, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

[DNM, testing]

Opened for initial testing. Currently, the controller looks at the standard MAPI MachineSet boot image opinion; when openshift/api#2396 lands, this PR can be updated to be actually check for the CPMS type. It also does not look for the CPMS feature gate for the same reason.

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Oct 6, 2025

/test verify

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 9, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy marked this pull request as ready for review October 9, 2025 13:48
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2025
@djoshy
Copy link
Contributor Author

djoshy commented Oct 9, 2025

Opening this up for initial review; I've integrated the API from openshift/api#2396.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 10, 2025

@djoshy: This pull request references MCO-1807 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

- What I did
This PR adds support for boot image updates to ControlPlaneMachineSet for the AWS, Azure and GCP platforms. A couple of key points to know about CPMS:

  • They are singletons in the Machine API namespace; typically named cluster. The boot images are stored under spec, in a field similar to MachineSets. For example, in AWS(abbreviated to only important fields):
spec:
 template:
   machineType: machines_v1beta1_machine_openshift_io
   machines_v1beta1_machine_openshift_io:
     metadata:
       labels:
         machine.openshift.io/cluster-api-cluster: ci-op-l4pngh10-79b69-zrm8p
         machine.openshift.io/cluster-api-machine-role: master
         machine.openshift.io/cluster-api-machine-type: master
     spec:
       providerSpec:
         value:
           ami:
             id: ami-09d23adad19cdb25c
  • They have a rollout strategy defined in spec.strategy.type, which can be set RollingUpdate, Recreate or OnDelete. In RollingUpdate mode, this meant that any deviation in the spec of the CPMS from the nodes will cause a complete control plane replacement, which is undesirable if the only deviation was boot images. This is because the nodes pivot to the latest RHCOS image described by the OCP release image, and it would effectively be no-op, adding to upgrade time. To avoid this issue, the CPMS operator was updated to ignore boot image fields during control plane machine reconciliation.

- How to verify it

  1. Create an AWS/GCP/Azure cluster in the TechPreview featureset.
  2. Take a back-up of the current CPMS object named cluster for comparison purposes.
  3. Opt-in for CPMS boot image updates using the MachineConfiguration object:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: All
  1. Modify the boot image field to an older value. This will vary per platform:
  • For AWS, use an older known AMI like ami-00abe7f9c6bd85a77.
  • For GCP, modify the image field to any value that starts with projects/rhcos-cloud/global/images/, for example projects/rhcos-cloud/global/images/test.
  • For Azure, the existing boot image will automatically be updated without any manipulation by you. This is because Azure clusters currently use gallery images and will be updated to use the latest marketplace images. When Azure clusters are updated to install with marketplace images, the user will be required to manipulate the image to test the Azure platform.
  1. Examine the MachineConfiguration object's status to see if the CPMS was reconciled successfully. The CPMS boot image fields should reflect the values you initially saw post-install. These are the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update took place.
  2. You can now attempt to resize the control plane by deleting one of the control plane machines. The CPMS operator should scale up a new machine to satisfy its spec.replicas value, and it should be able to do so successfully. This process might take a while(took about 10-15 minutes on GCP for me) to complete as the CPMS controller will first scale up the replacement and then drain and delete the older control plane machine. I think this is to maintain etcd quorum at all points of the process.
  3. Now, opt-out the cluster from CPMS boot image updates:
apiVersion: operator.openshift.io/v1
kind: MachineConfiguration
metadata:
 name: cluster
 namespace: openshift-machine-config-operator
spec:
 logLevel: Normal
 operatorLogLevel: Normal
 managedBootImages:
   machineManagers:
     - resource: controlplanemachinesets
       apiGroup: machine.openshift.io
       selection:
         mode: None
  1. Modify the boot image to an older value(see step 4). For Azure, you could modify the version field to an older value.
  2. Examine the MachineConfiguration object's status to see if the CPMS object was reconciled successfully. The CPMS boot image fields should reflect the values you set, and not the values described in the coreos-bootimages configmap. The machine-config-controller logs should also mention that a boot image update did not take place.
  3. All done! You have now successfully tested CPMS boot image updates!

Note: Since these are singleton objects, the Partial selection mode is not permitted while specifying boot image configuration. Hence, that mode does not need to be tested. The APIServer will reject any attempt to set Partial for CPMS objects, so I suppose that is something to test as well! 😄

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Oct 14, 2025

/retest-required

@djoshy
Copy link
Contributor Author

djoshy commented Oct 17, 2025

Rebased to fix conflicts

@ptalgulk01
Copy link

Pre-merge tested:
Verified using IPI based AWS,GCP and Azure TechPreview cluster

  • Opt-in Boot images by patching below content
$ oc edit machineconfigurations 
...
spec:
  managedBootImages:
    machineManagers:
    - apiGroup: machine.openshift.io
      resource: controlplanemachinesets
      selection:
        mode: All
  • Check the status updated in machineconfiguration
$ oc get machineconfigurations -o yaml
...
  spec:
    logLevel: Normal
    managedBootImages:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: controlplanemachinesets
        selection:
          mode: All
...
  status:
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: controlplanemachinesets
        selection:
          mode: All
  • Edit the controlplanemachinesets boot-image and check the changes are reverted.

AWS:

$ oc edit controlplanemachinesets -n openshift-machine-api cluster -o yaml
...
spec:
 template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
       .....
        providerSpec:
          value:
            ami:
              id: ami-00abe7f9c6bd85a77

$ oc get controlplanemachinesets -n openshift-machine-api cluster -o yaml | grep -i ami
            ami:
              id: ami-082a55a580d5538ed
            iamInstanceProfile:

GCP:

$ oc edit controlplanemachinesets -n openshift-machine-api cluster -o yaml
...
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
          value:
            disks:
            - autoDelete: true
              boot: true
              image: projects/rhcos-cloud/global/images/test

$ oc get controlplanemachinesets -n openshift-machine-api cluster -o yaml
...
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
          value:
            disks:
            - autoDelete: true
              boot: true
              image: projects/rhcos-cloud/global/images/rhcos-9-6-20250826-1-gcp-x86-64

Azure

$ oc edit controlplanemachinesets -n openshift-machine-api cluster -o yaml
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
        providerSpec:
          value:
            image:
              offer: aro4
              publisher: azureopenshift
              resourceID: ""
              sku: 419-v2-test
              type: MarketplaceNoPlan
              version: 419.6.20250523

$ oc get controlplanemachinesets -n openshift-machine-api cluster -o yaml
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
        providerSpec:
          value:
            image:
              offer: aro4
              publisher: azureopenshift
              resourceID: ""
              sku: 419-v2
              type: MarketplaceNoPlan
              version: 419.6.20250523
  • Check the MCC logs

AWS

$ oc logs machine-config-controller-68b8f55d59-tcrz7 | tail -n 10
Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
I1018 06:44:09.085287       1 ms_helpers.go:81] No MAPI machinesets were enrolled, so no MAPI machinesets will be enqueued.
I1018 06:47:19.526388       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 06:47:19.543905       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on AWS, with arch x86_64
I1018 06:47:19.544997       1 platform_helpers.go:191] Current image: us-east-2: ami-00abe7f9c6bd85a77
I1018 06:47:19.545041       1 platform_helpers.go:192] New target boot image: us-east-2: ami-082a55a580d5538ed
I1018 06:47:19.547682       1 cpms_helpers.go:197] Patching ControlPlaneMachineSet cluster
I1018 06:47:19.600228       1 cpms_helpers.go:250] Successfully patched ControlPlaneMachineSet cluster
I1018 06:47:19.601005       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 06:47:19.937723       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on AWS, with arch x86_64
I1018 06:47:19.938423       1 cpms_helpers.go:200] No patching required for ControlPlaneMachineSet cluster

GCP

$ oc logs machine-config-controller-68b8f55d59-nz45j | tail -n 10
Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
I1018 12:37:56.768278       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 12:37:56.795822       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on GCP, with arch x86_64
I1018 12:37:56.796701       1 platform_helpers.go:129] New target boot image: projects/rhcos-cloud/global/images/rhcos-9-6-20250826-1-gcp-x86-64
I1018 12:37:56.796717       1 platform_helpers.go:130] Current image: projects/rhcos-cloud/global/images/test
I1018 12:37:56.802007       1 cpms_helpers.go:197] Patching ControlPlaneMachineSet cluster
I1018 12:37:56.877277       1 warnings.go:110] "Warning: spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.targetPools: TargetPools field is not set on ControlPlaneMachineSet. This configuration is valid for private clusters. If your cluster is not private, please determine and set the correct value."
I1018 12:37:56.877670       1 cpms_helpers.go:250] Successfully patched ControlPlaneMachineSet cluster
I1018 12:37:56.878179       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 12:37:57.183849       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on GCP, with arch x86_64
I1018 12:37:57.184973       1 cpms_helpers.go:200] No patching required for ControlPlaneMachineSet cluster

Azure

$ oc logs machine-config-controller-68b8f55d59-mk4b8 | tail -n 10 
I1018 14:58:52.748797       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 14:58:52.783675       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on Azure, with arch x86_64
I1018 14:58:52.784546       1 platform_helpers.go:324] Current boot image version: 419.6.20250523
I1018 14:58:52.784565       1 platform_helpers.go:325] New target boot image version: 419.6.20250523
I1018 14:58:52.789604       1 cpms_helpers.go:197] Patching ControlPlaneMachineSet cluster
I1018 14:58:52.859691       1 cpms_helpers.go:250] Successfully patched ControlPlaneMachineSet cluster
I1018 14:58:52.860160       1 machine_set_boot_image_controller.go:272] ControlPlaneMachineSet cluster updated, reconciling enrolled machineset resources
I1018 14:58:52.923635       1 cpms_helpers.go:282] Reconciling controlplanemachineset cluster on Azure, with arch x86_64
I1018 14:58:52.924500       1 cpms_helpers.go:200] No patching required for ControlPlaneMachineSet cluster
  • Delete the machine and check the machines are scaled back. (Steps are same for all platforms)
$ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master

NAME                      PHASE     TYPE            REGION        ZONE            AGE
ppt-18-a-s576k-master-0   Running   n2-standard-4   us-central1   us-central1-a   7h23m
ppt-18-a-s576k-master-1   Running   n2-standard-4   us-central1   us-central1-b   7h23m
ppt-18-a-s576k-master-2   Running   n2-standard-4   us-central1   us-central1-c   7h23m

$ oc delete machine ppt-18-a-s576k-master-0  -n openshift-machine-api
machine.machine.openshift.io "ppt-18-a-s576k-master-0" deleted
  • Check the new machine is provisioned and new node is up
 $ oc get machines -n openshift-machine-api -l machine.openshift.io/cluster-api-machine-role=master
NAME                            PHASE         TYPE            REGION        ZONE            AGE
ppt-18-a-s576k-master-0         Deleting      n2-standard-4   us-central1   us-central1-a   8h
ppt-18-a-s576k-master-1         Running       n2-standard-4   us-central1   us-central1-b   8h
ppt-18-a-s576k-master-2         Running       n2-standard-4   us-central1   us-central1-c   8h
ppt-18-a-s576k-master-9wrzh-0   Provisioned   n2-standard-4   us-central1   us-central1-a   25s

$  oc get nodes -l node-role.kubernetes.io/master
NAME                                                            STATUS   ROLES                  AGE     VERSION
ppt-18-a-s576k-master-1.us-central1-b.c.openshift-qe.internal   Ready    control-plane,master   11h     v1.33.5
ppt-18-a-s576k-master-2.us-central1-c.c.openshift-qe.internal   Ready    control-plane,master   11h     v1.33.5
ppt-18-a-s576k-master-9wrzh-0                                   Ready    control-plane,master   3h36m   v1.33.5
  • Opt-out the boot-image
$ oc get machineconfigurations -o yaml
...
  spec:
    logLevel: Normal
    managedBootImages:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: controlplanemachinesets
        selection:
          mode: None
  status:
    managedBootImagesStatus:
      machineManagers:
      - apiGroup: machine.openshift.io
        resource: controlplanemachinesets
        selection:
          mode: None
  • Edit the controlplanemachinesets boot-image and check the changes are not reverted.

AWS:

$   oc get controlplanemachineset cluster -n openshift-machine-api -o yaml | grep -A3 ami
            ami:
              id: ami-00abe7f9c6bd85a77
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:

GCP:

$ oc get controlplanemachinesets -n openshift-machine-api cluster -o yaml
...
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
          value:
            disks:
            - autoDelete: true
              boot: true
              image: projects/rhcos-cloud/global/images/test

Azure

$ oc get controlplanemachinesets -n openshift-machine-api cluster -o yaml
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      spec:
        providerSpec:
          value:
            image:
              offer: aro4
              publisher: azureopenshift
              resourceID: ""
              sku: 419-v2-test
              type: MarketplaceNoPlan
              version: 419.6.20250523

I tired to patch the Parital mode here too, but the API does not allow it.

@djoshy
Copy link
Contributor Author

djoshy commented Oct 19, 2025

/verified by @ptalgulk01

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Oct 19, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This PR has been marked as verified by @ptalgulk01.

In response to this:

/verified by @ptalgulk01

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Member

@isabella-janssen isabella-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

All review comments were addressed & QE verified, so this looks great to me. Thanks @djoshy 🎉

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [djoshy,isabella-janssen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 679b9b5 and 2 for PR HEAD 09b1528 in total

@djoshy
Copy link
Contributor Author

djoshy commented Oct 20, 2025

/test all

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4606de4 and 1 for PR HEAD 09b1528 in total

@djoshy
Copy link
Contributor Author

djoshy commented Oct 20, 2025

/retest-required

1 similar comment
@djoshy
Copy link
Contributor Author

djoshy commented Oct 21, 2025

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-op-ocl 502f475 link false /test e2e-gcp-op-ocl
ci/prow/e2e-azure-ovn-upgrade-out-of-change 502f475 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/e2e-aws-mco-disruptive 502f475 link false /test e2e-aws-mco-disruptive
ci/prow/okd-scos-e2e-aws-ovn 09b1528 link false /test okd-scos-e2e-aws-ovn
ci/prow/bootstrap-unit 09b1528 link false /test bootstrap-unit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@djoshy
Copy link
Contributor Author

djoshy commented Oct 21, 2025

/retest-required

@openshift-merge-bot openshift-merge-bot bot merged commit 06e7b70 into openshift:main Oct 21, 2025
13 of 15 checks passed
@djoshy djoshy deleted the add-cpms-support branch November 10, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants