-
Notifications
You must be signed in to change notification settings - Fork 245
OCPBUGS-62634, OCPBUGS-62624: Fix DeploymentController Progressing #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The Progression condition should be set only when the Deployment has a new content and is actively rolling out its update. After all replicas are updated, the Progressing condition should be false, even when some pods are missing. E.g. because a node is drained, something evicted them or so on. Use Deployment condition Progressing with reason NewReplicaSetAvailable to detect that Deployment has been fully rolled out in the past.
|
@jsafrane: This pull request references Jira Issue OCPBUGS-62634, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-62624, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@jsafrane: This pull request references Jira Issue OCPBUGS-62634, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. This pull request references Jira Issue OCPBUGS-62624, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks reasonable to me.
The only concern is that it may overkill Progressing for other reasons we want to keep.
If we can verify before merging that the pull solves the issue in the bugs, I think it it worth trying.
Also left a couple of NITs and questions in the pull.
| } | ||
| // Deployment that are fully deployed get Progressing condition with Reason NewReplicaSetAvailable condition. | ||
| // https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#complete-deployment | ||
| // Any subsequent missing replicas (e.g. caused by a node reboot) must not not change the Progressing condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the go doc on "NewReplicaSetAvailable", i cannot see "a node reboot" as a cause.
But above kube docs implies "NewReplicaSetAvailable" is an indicator of Deployment rolled out successfully even though new pods might be coming on the way. It sounds to me good enough to not report Progressing=True for a CO.
pkg/operator/deploymentcontroller/deployment_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/operator/deploymentcontroller/deployment_controller_test.go
Outdated
Show resolved
Hide resolved
pkg/operator/deploymentcontroller/deployment_controller_test.go
Outdated
Show resolved
Hide resolved
My assumption is that any reconfiguration must lead to re-deployment, i.e. Depolyment spec.template changes and generation increases and so on. If there is an OCP reconfiguration that should lead to Progressing=true, but the Deployment pods stay as they are, other controller must catch it, not DeploymentController. It's very common to run many controllers in a library-go style operator. |
Co-authored-by: Hongkai Liu <[email protected]>
It does not need anything from DeploymentController
15d18d7 to
b1970ef
Compare
|
/label tide/merge-method-squash |
|
@jsafrane: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
The testing results look good. And here also good. But this one is bad. |
Yes. cluster-storage-operator runs Deployments with other CSI driver operators, which then actually install a CSI driver (= create Deployments + DaemonSets). We will need to bump the library-go changes to ~10 other repos :-. I'll comment on openshift/cluster-storage-operator#634 |
|
/lgtm /hold Free to cancel when ready. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongkailiu, jsafrane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Progression condition should be set only when the Deployment has a new content and is actively rolling out its update. After all replicas are updated, the Progressing condition should be false, even when some pods are missing. E.g. because a node is drained, something evicted them or so on.
Use Deployment condition
Progressingwith reasonNewReplicaSetAvailableto detect that Deployment has been fully rolled out in the past.