Skip to content

Conversation

@jsafrane
Copy link
Contributor

The Progression condition should be set only when the Deployment has a new content and is actively rolling out its update. After all replicas are updated, the Progressing condition should be false, even when some pods are missing. E.g. because a node is drained, something evicted them or so on.

Use Deployment condition Progressing with reason NewReplicaSetAvailable to detect that Deployment has been fully rolled out in the past.

The Progression condition should be set only when the Deployment has a new
content and is actively rolling out its update. After all replicas are
updated, the Progressing condition should be false, even when some pods are
missing. E.g. because a node is drained, something evicted them or so on.

Use Deployment condition Progressing with reason NewReplicaSetAvailable to
detect that Deployment has been fully rolled out in the past.
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 10, 2025
@openshift-ci-robot
Copy link

@jsafrane: This pull request references Jira Issue OCPBUGS-62634, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-62624, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The Progression condition should be set only when the Deployment has a new content and is actively rolling out its update. After all replicas are updated, the Progressing condition should be false, even when some pods are missing. E.g. because a node is drained, something evicted them or so on.

Use Deployment condition Progressing with reason NewReplicaSetAvailable to detect that Deployment has been fully rolled out in the past.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2025
@jsafrane
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 10, 2025
@openshift-ci-robot
Copy link

@jsafrane: This pull request references Jira Issue OCPBUGS-62634, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

This pull request references Jira Issue OCPBUGS-62624, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Member

@hongkailiu hongkailiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks reasonable to me.
The only concern is that it may overkill Progressing for other reasons we want to keep.
If we can verify before merging that the pull solves the issue in the bugs, I think it it worth trying.
Also left a couple of NITs and questions in the pull.

}
// Deployment that are fully deployed get Progressing condition with Reason NewReplicaSetAvailable condition.
// https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment
// Any subsequent missing replicas (e.g. caused by a node reboot) must not not change the Progressing condition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the go doc on "NewReplicaSetAvailable", i cannot see "a node reboot" as a cause.

https://github.com/kubernetes/kubernetes/blob/ee1ff4866e30ac3685da3e007979b0e9ab7651a6/pkg/controller/deployment/util/deployment_util.go#L76-L77

But above kube docs implies "NewReplicaSetAvailable" is an indicator of Deployment rolled out successfully even though new pods might be coming on the way. It sounds to me good enough to not report Progressing=True for a CO.

@jsafrane
Copy link
Contributor Author

The only concern is that it may overkill Progressing for other reasons we want to keep.

My assumption is that any reconfiguration must lead to re-deployment, i.e. Depolyment spec.template changes and generation increases and so on.

If there is an OCP reconfiguration that should lead to Progressing=true, but the Deployment pods stay as they are, other controller must catch it, not DeploymentController. It's very common to run many controllers in a library-go style operator.

jsafrane and others added 2 commits October 13, 2025 15:23
It does not need anything from DeploymentController
@jsafrane jsafrane force-pushed the progressing-deployment branch from 15d18d7 to b1970ef Compare October 13, 2025 13:26
@jsafrane
Copy link
Contributor Author

/label tide/merge-method-squash
does it work here?

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 13, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 13, 2025

@jsafrane: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hongkailiu
Copy link
Member

The testing results look good.

And here also good.

But this one is bad.
The fix from this pull does not solve the issue on co/storage. Is it expected?

Oct 14 19:02:20.982 W clusteroperator/storage condition/Progressing reason/GCPPDCSIDriverOperatorCR_GCPPDDriverControllerServiceController_Deploying status/True GCPPDCSIDriverOperatorCRProgressing: GCPPDDriverControllerServiceControllerProgressing: Waiting for Deployment to deploy pods (exception: https://issues.redhat.com/browse/OCPBUGS-62634)

@jsafrane
Copy link
Contributor Author

The fix from this pull does not solve the issue on co/storage. Is it expected?

Yes. cluster-storage-operator runs Deployments with other CSI driver operators, which then actually install a CSI driver (= create Deployments + DaemonSets). We will need to bump the library-go changes to ~10 other repos :-.

I'll comment on openshift/cluster-storage-operator#634

@hongkailiu
Copy link
Member

/lgtm

/hold

Free to cancel when ready.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2025
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants