Skip to content

Conversation

@QiWang19
Copy link
Member

@QiWang19 QiWang19 commented Nov 13, 2025

- What I did

- How to verify it

- Description for the changelog

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: QiWang19
Once this PR has been reviewed and has the lgtm label, please assign dkhater-redhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 13, 2025
@QiWang19 QiWang19 changed the title block upgrades for conflict non-default ClusterImagePolicy resources OCPBUGS-64822: block upgrades for conflict non-default ClusterImagePolicy resources Nov 13, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 13, 2025
@openshift-ci-robot
Copy link
Contributor

@QiWang19: This pull request references Jira Issue OCPBUGS-64822, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, POST, but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@QiWang19
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Nov 13, 2025
@openshift-ci-robot
Copy link
Contributor

@QiWang19: This pull request references Jira Issue OCPBUGS-64822, which is valid.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.z) matches configured target version for branch (4.20.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note text is set and does not match the template
  • dependent bug Jira Issue OCPBUGS-64823 is in the state Closed (Done), which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA))
  • dependent Jira Issue OCPBUGS-64823 targets the "4.21.0" version, which is one of the valid target versions: 4.21.0
  • bug has dependents

Requesting review from QA contact:
/cc @asahay19

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from asahay19 November 13, 2025 15:42
@QiWang19
Copy link
Member Author

/test e2e-aws-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 13, 2025

@QiWang19: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test e2e-aws-ovn
/test e2e-aws-ovn-upgrade
/test e2e-gcp-op-1of2
/test e2e-gcp-op-2of2
/test e2e-gcp-op-single-node
/test e2e-hypershift
/test images
/test okd-scos-images
/test periodics-images
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test bootstrap-unit
/test e2e-agent-compact-ipv4
/test e2e-aws-disruptive
/test e2e-aws-mco-disruptive
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-fips-op
/test e2e-aws-ovn-ocb-techpreview
/test e2e-aws-ovn-serial-ipsec
/test e2e-aws-ovn-upgrade-ipsec
/test e2e-aws-ovn-upgrade-ocb-techpreview
/test e2e-aws-ovn-upgrade-out-of-change
/test e2e-aws-ovn-windows
/test e2e-aws-ovn-workers-rhel8
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-upgrade-single-node
/test e2e-aws-workers-rhel8
/test e2e-azure
/test e2e-azure-ovn-upgrade
/test e2e-azure-ovn-upgrade-out-of-change
/test e2e-azure-upgrade
/test e2e-gcp-mco-disruptive
/test e2e-gcp-op
/test e2e-gcp-op-ocl
/test e2e-gcp-op-techpreview
/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-rt
/test e2e-gcp-rt-op
/test e2e-gcp-single-node
/test e2e-gcp-upgrade
/test e2e-hypershift-techpreview
/test e2e-metal-assisted
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-ipv6
/test e2e-metal-ovn-two-node-arbiter
/test e2e-metal-ovn-two-node-fencing
/test e2e-openstack
/test e2e-openstack-dualstack
/test e2e-openstack-externallb
/test e2e-openstack-hypershift
/test e2e-openstack-parallel
/test e2e-openstack-singlestackv6
/test e2e-ovirt
/test e2e-ovirt-upgrade
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-ovn-upi
/test e2e-vsphere-ovn-upi-zones
/test e2e-vsphere-ovn-zones
/test e2e-vsphere-upgrade
/test okd-scos-e2e-aws-ovn
/test security

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-machine-config-operator-release-4.20-bootstrap-unit
pull-ci-openshift-machine-config-operator-release-4.20-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-release-4.20-e2e-aws-ovn-upgrade
pull-ci-openshift-machine-config-operator-release-4.20-e2e-gcp-op-1of2
pull-ci-openshift-machine-config-operator-release-4.20-e2e-gcp-op-2of2
pull-ci-openshift-machine-config-operator-release-4.20-e2e-gcp-op-single-node
pull-ci-openshift-machine-config-operator-release-4.20-e2e-hypershift
pull-ci-openshift-machine-config-operator-release-4.20-images
pull-ci-openshift-machine-config-operator-release-4.20-okd-scos-e2e-aws-ovn
pull-ci-openshift-machine-config-operator-release-4.20-okd-scos-images
pull-ci-openshift-machine-config-operator-release-4.20-periodics-images
pull-ci-openshift-machine-config-operator-release-4.20-security
pull-ci-openshift-machine-config-operator-release-4.20-unit
pull-ci-openshift-machine-config-operator-release-4.20-verify
pull-ci-openshift-machine-config-operator-release-4.20-verify-deps

In response to this:

/test e2e-aws-ovn-techpreview

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@QiWang19
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 13, 2025

@QiWang19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cc9c00e0-c0a8-11f0-8b11-4184cda435c8-0

@QiWang19 QiWang19 force-pushed the cip-guard-upgrade branch 2 times, most recently from 0ae6650 to 0ca20b7 Compare November 13, 2025 17:46
@QiWang19
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 13, 2025

@QiWang19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f6a9e450-c0b8-11f0-83d8-cdc5a82ee9c5-0


// Check for ClusterImagePolicy named "openshift" which conflicts with the cluster default ClusterImagePolicy object
// Only check for Default featureSet clusters allowing 4.20 ci techpreview builds upgrades
fg, err := optr.configClient.ConfigV1().FeatureGates().Get(context.TODO(), "cluster", metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making the guard conditional on FeatureSet == Default is the semantics we want 👍 I'm not sure about the implementation though. Will the direct call here result in API traffic? Or is optr.configClient an informer that can serve the information from an MCO-local cache? Poking around, there's some kind of FeatureGate thing in the MCO initialization which is aware of feature gates, but maybe not currently able to handle feature sets? And it's feature-sets that the CVO is using to decide whether to push the openshift ClusterImageSet in 4.20. If this current call is not going to create a constant flow of Kube API calls, then I think this pull is ready to go. If the call is creating a flow of Kube API calls, I think we want to get that adjusted somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e-aws-ovn-upgrade -> Artifacts -> gather-audit-logs artifacts:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5414/pull-ci-openshift-machine-config-operator-release-4.20-e2e-aws-ovn-upgrade/1989027473844604928/artifacts/e2e-aws-ovn-upgrade/gather-audit-logs/artifacts/audit-logs.tar | tar -xz --strip-components=2
$ zgrep -h '"resource":"featuregates"' kube-apiserver/*audit*.log.gz | jq -r '.verb + " " + .user.username' | sort | uniq -c | sort -n | tail -n3
     61 get system:serviceaccount:openshift-cluster-version:default
    176 watch system:serviceaccount:openshift-machine-config-operator:machine-config-daemon
   1255 get system:serviceaccount:openshift-machine-config-operator:machine-config-operator
$ zgrep -h '"system:serviceaccount:openshift-machine-config-operator:machine-config-operator".*"resource":"featuregates"' kube-apiserver/*audit*.log.gz | jq -r '.stageTimestamp' | sort
2025-11-13T18:46:42.535543Z
2025-11-13T18:54:26.519068Z
2025-11-13T18:54:26.520178Z
2025-11-13T18:58:22.671948Z
2025-11-13T18:58:22.681336Z
2025-11-13T19:02:18.092984Z
2025-11-13T19:02:18.146188Z
2025-11-13T19:09:40.147016Z
2025-11-13T19:09:40.148910Z
2025-11-13T19:18:30.149940Z
2025-11-13T19:18:30.151214Z
2025-11-13T19:19:33.329084Z
2025-11-13T19:19:33.333734Z
2025-11-13T19:27:30.018375Z
2025-11-13T19:27:30.149016Z
2025-11-13T19:30:06.327624Z <- picks up the pace here
2025-11-13T19:30:18.627364Z
2025-11-13T19:30:18.662089Z
2025-11-13T19:31:50.332703Z
2025-11-13T19:31:51.331799Z
...multiple per minute...
2025-11-13T20:08:27.733570Z

not looking good. Checking on when in that update job the MCO was bumped:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/5414/pull-ci-openshift-machine-config-operator-release-4.20-e2e-aws-ovn-upgrade/1989027473844604928/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/inspect/namespaces/openshift-machine-config-operator/apps/replicasets.yaml | yaml2json | jq -r '.items[].metadata | .creationTimestamp + " " + .name' | sort
2025-11-13T18:33:16Z machine-config-operator-6b87db5fc
2025-11-13T18:39:13Z machine-config-controller-7f74b85989
2025-11-13T19:30:07Z machine-config-operator-569bd6f777
2025-11-13T19:31:14Z machine-config-controller-6494698c48

So yeah, looks like this code is flooding the Kube API server with GETs, and we need to figure out how to wire the feature-set lookup up to our existing cached informer.

Copy link
Member Author

@QiWang19 QiWang19 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SigstoreImageVerificationPKI is introduced as a TechPreview feature gate in 4.20. If the cluster is on Default feature set in 4.20, this gate will not be enabled. a workaround is use this featuregate to indicate the cluster’s featureset, and featureset changes are not planned to be backported. I think it can work(add a commit for testing e285e21), but the PKI feature gate was never designed to signal the feature set.

@QiWang19
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 17, 2025

@QiWang19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c2505fa0-c3f0-11f0-9f37-7713c0102c8d-0

@wking
Copy link
Member

wking commented Nov 17, 2025

Build cluster hiccup:

Trying to pull image-registry.openshift-image-registry.svc:5000/ci-op-1iidzfil/pipeline@sha256:d4f2629dd1eb1700a53020a4b65ece824604f2b5b43e19f65048dc4911f96540...
error: error creating buildah builder: initializing source docker://image-registry.openshift-image-registry.svc:5000/ci-op-1iidzfil/pipeline@sha256:d4f2629dd1eb1700a53020a4b65ece824604f2b5b43e19f65048dc4911f96540: pinging container registry image-registry.openshift-image-registry.svc:5000: Get "https://image-registry.openshift-image-registry.svc:5000/v2/": dial tcp: lookup image-registry.openshift-image-registry.svc on 172.30.0.10:53: no such host

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

@QiWang19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 6203102 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade e285e21 link true /test e2e-aws-ovn-upgrade
ci/prow/bootstrap-unit e285e21 link false /test bootstrap-unit
ci/prow/e2e-gcp-op-2of2 e285e21 link true /test e2e-gcp-op-2of2
ci/prow/e2e-gcp-op-single-node e285e21 link true /test e2e-gcp-op-single-node
ci/prow/e2e-aws-ovn e285e21 link true /test e2e-aws-ovn
ci/prow/e2e-gcp-op-1of2 e285e21 link true /test e2e-gcp-op-1of2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@QiWang19
Copy link
Member Author

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 18, 2025

@QiWang19: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2c2720c0-c48d-11f0-8ace-4338ea8bfe20-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants