-
Notifications
You must be signed in to change notification settings - Fork 380
Move ControllerModifyVolume to GA #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Move ControllerModifyVolume to GA #588
Conversation
a6080db
to
4f9da2b
Compare
Thought the build failure is related to protocolbuffers/protobuf#16163, but manually updated it, even make succeeded locally the presubmit did not pass. |
/retest |
4f9da2b
to
66d1b6b
Compare
66d1b6b
to
b64b6a6
Compare
+1 |
lgtm |
@@ -1021,8 +1020,6 @@ message ControllerGetVolumeResponse { | |||
VolumeStatus status = 2; | |||
} | |||
message ControllerModifyVolumeRequest { | |||
option (alpha_message) = true; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we promote this to GA, can we change mutable_parameters
to OPTIONAL? And add words like:
When not specified, SPs MAY cancel previously requested in-progress modifications.
SPs SHOULD return success only when the volume is stable and no modifications on going.
This extends the use-case and can be a breaking change for SP. But the semantic should be clear and straightforward.
This can be useful for Kubernetes when reverting volumeAttributeClassName to empty, to ensure we've reached a stable state.
And if we add topology support in the future, passing empty mutable_parameters
can be useful for CO to fetch the current topology of the volume without actually modify it, also useful in the reverting volumeAttributeClassName to empty case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to do that? We made it explicitly as something we need for Modify. And if you need a cancelling operation, we should have a CancelOperation API instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if you need a cancelling operation, we should have a CancelOperation API instead?
The key point is not canceling, but waiting for a stable state, to avoid left a mess behind. For use-case like kubernetes/kubernetes#132106 . In the PR description:
rollback from an infeasible pvc.spec.VacName to no VAC
However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we want to support topology in the future (kubernetes/enhancements#5381), the current proposal is allowing setting new topology before modify success. So what if the topology has change and we want to rollback? We will need a way to fetch the current topology of the volume without actually modify it.
Of course we can validate the parameters before change topology. But there are still possibilities that the operation may fail half-way.
- We may change multiple aspect of the volume in the same RPC call, e.g. disk tags, disk type, performance level. Some of them may success, but others may still fail.
- The validation rules embedded in the CSI can be outdated.
- Modification can be slow. There maybe other operator modifying the same volume concurrently, invalidating the previously valid request.
Or should we just say: SPs should only return OK or Abort if any part of the modification succeeded
? But I'm afraid that SP may also lost track of previously failed modification tasks. SP may also not able to distinct the partial or complete failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.
For most errors we do not allow going back to older state. We have pretty well defined semantic of what we consider infeasible in k8s+csi - https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/util/util.go#L274 .
We do not have an API to cancel any of in-progress CSI operations and that has worked well enough.
What type of PR is this?
Feature
What this PR does / why we need it:
This PR moves ControllerModifyVolume to GA.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce an API-breaking change?: