Skip to content

Conversation

@djoshy
Copy link
Contributor

@djoshy djoshy commented Sep 22, 2025

This bug updates the nodeSelector of the cronJob to use the legacy control plane label(node-role.kubernetes.io/master) and also improves the logging of some of the oc get queries.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 22, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2025
@djoshy djoshy changed the title NO_ISSUE: Improve MCN CRD clean-up erroring OCPBUGS-62073: Improve MCN CRD clean-up script Sep 22, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-62073, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set
  • release note text must be set and not match the template OR release note type must be set to "Release Note Not Required". For more information you can reference the OpenShift Bug Process.
  • expected Jira Issue OCPBUGS-62073 to depend on a bug targeting a version in 4.21.0 and in one of the following states: MODIFIED, ON_QA, VERIFIED, but no dependents were found

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

[DNM, testing]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Sep 22, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-62073, which is invalid:

  • expected dependent Jira Issue OCPBUGS-62082 to be in one of the following states: MODIFIED, ON_QA, VERIFIED, but it is Closed (Not a Bug) instead
  • expected dependent Jira Issue OCPBUGS-62082 to target a version in 4.21.0, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Sep 22, 2025

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-62073, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-62082 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-62082 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy marked this pull request as ready for review September 22, 2025 19:06
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This pull request references Jira Issue OCPBUGS-62073, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-62082 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-62082 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents

In response to this:

This bug updates the nodeSelector of the cronJob to use the legacy control plane label(node-role.kubernetes.io/master) and also improves the logging of some of the oc get queries.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Sep 22, 2025

/retest

1 similar comment
@djoshy
Copy link
Contributor Author

djoshy commented Sep 22, 2025

/retest

@djoshy
Copy link
Contributor Author

djoshy commented Sep 22, 2025

I've tested this against a 4.19.5 cluster and it is behaving as expected. I first manually applied a v1alpha1 CRD to verify our solution:

$ oc get crd machineconfignodes.machineconfiguration.openshift.io -o yaml | grep v1alpha1
<snip>
    name: v1alpha1
  - v1alpha1

Then, it was upgraded to a 4.20 build with this PR. When the CVO applied the MCO manifests, I waited for the cronjob to suspend itself and then examined the cronjob's pod logs:

$ oc logs -f machine-config-nodes-crd-cleanup-29309540-cjr8r -n openshift-machine-config-operator
Checking for MachineConfigNodes CRD with v1alpha1 version...
Found CRD machineconfignodes.machineconfiguration.openshift.io with v1alpha1 version, deleting it...
Successfully deleted CRD machineconfignodes.machineconfiguration.openshift.io
CRD cleanup completed successfully
Suspending cronjob...
Warning: spec.jobTemplate.spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
cronjob.batch/machine-config-nodes-crd-cleanup patched

I then verified that the v1 CRD was created. This took a couple of minutes to appear as the CVO has to run through its reconciliation graph again:

$ oc get crd machineconfignodes.machineconfiguration.openshift.io -o yaml | grep v1
<snip>
    name: v1
  - v1

After that, I deleted the cronjob manually to see how it would behave if the v1 CRD already existed:

$ oc delete cronjob machine-config-nodes-crd-cleanup -n openshift-machine-config-operator
cronjob.batch "machine-config-nodes-crd-cleanup" deleted

Observe the cronjob pod logs after the CVO reconciles it again(and it completes):

$ oc logs -f machine-config-nodes-crd-cleanup-29309545-gw2ws -n openshift-machine-config-operator
Checking for MachineConfigNodes CRD with v1alpha1 version...
CRD machineconfignodes.machineconfiguration.openshift.io does not have v1alpha1 version, nothing to clean up
Suspending cronjob...
Warning: spec.jobTemplate.spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
cronjob.batch/machine-config-nodes-crd-cleanup patched

This also matches with our current expectations.

/verified by @djoshy

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Sep 22, 2025
@openshift-ci-robot
Copy link
Contributor

@djoshy: This PR has been marked as verified by @djoshy.

In response to this:

I've tested this against a 4.19.5 cluster and it is behaving as expected. I first manually applied a v1alpha1 CRD to verify our solution:

$ oc get crd machineconfignodes.machineconfiguration.openshift.io -o yaml | grep v1alpha1
<snip>
   name: v1alpha1
 - v1alpha1

Then, it was upgraded to a 4.20 build with this PR. When the CVO applied the MCO manifests, I waited for the cronjob to suspend itself and then examined the cronjob's pod logs:

$ oc logs -f machine-config-nodes-crd-cleanup-29309540-cjr8r -n openshift-machine-config-operator
Checking for MachineConfigNodes CRD with v1alpha1 version...
Found CRD machineconfignodes.machineconfiguration.openshift.io with v1alpha1 version, deleting it...
Successfully deleted CRD machineconfignodes.machineconfiguration.openshift.io
CRD cleanup completed successfully
Suspending cronjob...
Warning: spec.jobTemplate.spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
cronjob.batch/machine-config-nodes-crd-cleanup patched

I then verified that the v1 CRD was created. This took a couple of minutes to appear as the CVO has to run through its reconciliation graph again:

$ oc get crd machineconfignodes.machineconfiguration.openshift.io -o yaml | grep v1
<snip>
   name: v1
 - v1

After that, I deleted the cronjob manually to see how it would behave if the v1 CRD already existed:

$ oc delete cronjob machine-config-nodes-crd-cleanup -n openshift-machine-config-operator
cronjob.batch "machine-config-nodes-crd-cleanup" deleted

Observe the cronjob pod logs after the CVO reconciles it again(and it completes):

$ oc logs -f machine-config-nodes-crd-cleanup-29309545-gw2ws -n openshift-machine-config-operator
Checking for MachineConfigNodes CRD with v1alpha1 version...
CRD machineconfignodes.machineconfiguration.openshift.io does not have v1alpha1 version, nothing to clean up
Suspending cronjob...
Warning: spec.jobTemplate.spec.template.spec.nodeSelector[node-role.kubernetes.io/master]: use "node-role.kubernetes.io/control-plane" instead
cronjob.batch/machine-config-nodes-crd-cleanup patched

This also matches with our current expectations.

/verified by @djoshy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with David and we're good with this intermediary fix to see what is tripping up some upgrades.

/label backport-risk-assessed

Holding off on lgtm while we check the node labels

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Sep 22, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 23, 2025

@djoshy: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/bootstrap-unit 1b77993 link false /test bootstrap-unit
ci/prow/e2e-gcp-mco-disruptive 1b77993 link false /test e2e-gcp-mco-disruptive
ci/prow/e2e-gcp-op-ocl 1b77993 link false /test e2e-gcp-op-ocl
ci/prow/e2e-aws-mco-disruptive 1b77993 link false /test e2e-aws-mco-disruptive
ci/prow/e2e-agent-compact-ipv4 1b77993 link false /test e2e-agent-compact-ipv4
ci/prow/e2e-aws-ovn-upgrade-out-of-change 1b77993 link false /test e2e-aws-ovn-upgrade-out-of-change

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@djoshy
Copy link
Contributor Author

djoshy commented Sep 23, 2025

Holding off on lgtm while we check the node labels

Per conversation with Scott on slack, it seems best to use node-role.kubernetes.io/master in the nodeselector. The addition of the node-role.kubernetes.io/control-panel label only began recently in 4.19 and 4.20, so we can't rely on it for every cluster just yet.

Copy link
Member

@isabella-janssen isabella-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 23, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 23, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: djoshy, isabella-janssen, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [djoshy,isabella-janssen,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 1e4bcd6 into openshift:release-4.20 Sep 23, 2025
15 of 21 checks passed
@openshift-ci-robot
Copy link
Contributor

@djoshy: Jira Issue Verification Checks: Jira Issue OCPBUGS-62073
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-62073 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

In response to this:

This bug updates the nodeSelector of the cronJob to use the legacy control plane label(node-role.kubernetes.io/master) and also improves the logging of some of the oc get queries.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented Sep 23, 2025

/cherry-pick release-4.19

@openshift-cherrypick-robot

@djoshy: new pull request created: #5299

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@djoshy djoshy changed the title OCPBUGS-62073: Improve MCN CRD clean-up script [release-4.20] OCPBUGS-62073: Improve MCN CRD clean-up script Sep 23, 2025
@djoshy djoshy deleted the crd-delete-fix branch October 1, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.