Skip to content

Move ControllerModifyVolume to GA #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sunnylovestiramisu
Copy link
Contributor

What type of PR is this?
Feature

What this PR does / why we need it:

This PR moves ControllerModifyVolume to GA.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce an API-breaking change?:

Move ControllerModifyVolume to GA

@sunnylovestiramisu sunnylovestiramisu force-pushed the kep-3751 branch 5 times, most recently from a6080db to 4f9da2b Compare June 12, 2025 15:59
@sunnylovestiramisu
Copy link
Contributor Author

Thought the build failure is related to protocolbuffers/protobuf#16163, but manually updated it, even make succeeded locally the presubmit did not pass.

@sunnylovestiramisu
Copy link
Contributor Author

/retest

@sunnylovestiramisu sunnylovestiramisu force-pushed the kep-3751 branch 13 times, most recently from 4f9da2b to 66d1b6b Compare July 16, 2025 16:49
@mowangdk
Copy link

+1

@gnufied
Copy link
Contributor

gnufied commented Jul 17, 2025

lgtm

@@ -1021,8 +1020,6 @@ message ControllerGetVolumeResponse {
VolumeStatus status = 2;
}
message ControllerModifyVolumeRequest {
option (alpha_message) = true;

Copy link
Contributor

@huww98 huww98 Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we promote this to GA, can we change mutable_parameters to OPTIONAL? And add words like:

When not specified, SPs MAY cancel previously requested in-progress modifications.
SPs SHOULD return success only when the volume is stable and no modifications on going. 

This extends the use-case and can be a breaking change for SP. But the semantic should be clear and straightforward.

This can be useful for Kubernetes when reverting volumeAttributeClassName to empty, to ensure we've reached a stable state.

And if we add topology support in the future, passing empty mutable_parameters can be useful for CO to fetch the current topology of the volume without actually modify it, also useful in the reverting volumeAttributeClassName to empty case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to do that? We made it explicitly as something we need for Modify. And if you need a cancelling operation, we should have a CancelOperation API instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you need a cancelling operation, we should have a CancelOperation API instead?

The key point is not canceling, but waiting for a stable state, to avoid left a mess behind. For use-case like kubernetes/kubernetes#132106 . In the PR description:

rollback from an infeasible pvc.spec.VacName to no VAC

However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we want to support topology in the future (kubernetes/enhancements#5381), the current proposal is allowing setting new topology before modify success. So what if the topology has change and we want to rollback? We will need a way to fetch the current topology of the volume without actually modify it.

Of course we can validate the parameters before change topology. But there are still possibilities that the operation may fail half-way.

  • We may change multiple aspect of the volume in the same RPC call, e.g. disk tags, disk type, performance level. Some of them may success, but others may still fail.
  • The validation rules embedded in the CSI can be outdated.
  • Modification can be slow. There maybe other operator modifying the same volume concurrently, invalidating the previously valid request.

Or should we just say: SPs should only return OK or Abort if any part of the modification succeeded? But I'm afraid that SP may also lost track of previously failed modification tasks. SP may also not able to distinct the partial or complete failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is no concept like infeasible error in the CSI spec. I think the only way we can sure that the volume returns to a stable state is retry the RPC until it returns success.

For most errors we do not allow going back to older state. We have pretty well defined semantic of what we consider infeasible in k8s+csi - https://github.com/kubernetes-csi/external-resizer/blob/master/pkg/util/util.go#L274 .

We do not have an API to cancel any of in-progress CSI operations and that has worked well enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants