Skip to content

[cinder-csi-plugin] Refactor list volumes #2766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 19, 2025

Conversation

kon-angelo
Copy link
Contributor

@kon-angelo kon-angelo commented Feb 5, 2025

What this PR does / why we need it:
Fixes the linked issue and in addition does a small refactoring simplifying the code for the list volumes. In particular it changes the behavior slightly in 2 ways:

  • if the list operation returns any page token even with max-entries == 0, we will respect the token and return it in the response.
  • in the case of multiple cloud envs, we will no longer do the discovery for "the first cloud with volumes", instead we return the next cloud env. The discovery costs additional Cinder API calls which are more costly than simply issuing an additional local grpc call from the external-attacher.

Regarding point 2: the external-attacher will continue sending requests to list volumes while the response contains a token:

https://github.com/kubernetes-csi/external-attacher/blob/e5ae92e87bb6c3d99061ca515285c5c9e76e0414/pkg/attacher/lister.go#L63-L65

With this change, we will expect the csi-attacher to issue a request per cloud in our configuration.

Which issue this PR fixes(if applicable):
fixes #2764

Special notes for reviewers:

Release note:

Fix an issue with the `ListVolumes` not respecting pagination tokens from Cinder API with `req.maxEntries` is set to 0.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 5, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @kon-angelo!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @kon-angelo. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 5, 2025
@k8s-ci-robot k8s-ci-robot requested review from dulek and zetaab February 5, 2025 16:10
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Feb 5, 2025
@kon-angelo kon-angelo marked this pull request as ready for review February 6, 2025 15:40
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2025
@k8s-ci-robot k8s-ci-robot requested a review from kayrus February 6, 2025 15:41
@kayrus
Copy link
Contributor

kayrus commented Feb 13, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 13, 2025
Copy link
Contributor

@kayrus kayrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. It took me a while to understand this logic, since I didn't review the initial #2551 and I don't have environment with a multi-region setup. See my comments below.

@@ -33,11 +33,12 @@ import (
"google.golang.org/grpc/status"
"google.golang.org/protobuf/types/known/timestamppb"

"k8s.io/klog/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please move this back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly ping

@@ -437,7 +438,7 @@ func genFakeVolumeEntry(fakeVol volumes.Volume) *csi.ListVolumesResponse_Entry {
}
}
func genFakeVolumeEntries(fakeVolumes []volumes.Volume) []*csi.ListVolumesResponse_Entry {
var entries []*csi.ListVolumesResponse_Entry
entries := make([]*csi.ListVolumesResponse_Entry, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
entries := make([]*csi.ListVolumesResponse_Entry, 0)
entries := make([]*csi.ListVolumesResponse_Entry, len(fakeVolumes))

@@ -444,7 +444,6 @@ func (cs *controllerServer) ListVolumes(ctx context.Context, req *csi.ListVolume
var cloudsToken = CloudsStartingToken{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's enough to have var cloudsToken CloudsStartingToken

@@ -405,7 +406,6 @@ func (cs *controllerServer) ControllerUnpublishVolume(ctx context.Context, req *
type CloudsStartingToken struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to move the type declaration to the top and make it private, since it's not used anywhere else

Entries: cloudsVentries,
NextToken: string(data),
}, nil
volumeList, nextPageToken, err := cs.Clouds[cloudsNames[idx]].ListVolumes(ctx, maxEntries, startingToken)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
volumeList, nextPageToken, err := cs.Clouds[cloudsNames[idx]].ListVolumes(ctx, maxEntries, startingToken)
volumeList, cloudsToken.Token, err := cs.Clouds[cloudsNames[idx]].ListVolumes(ctx, maxEntries, startingToken)

volumeEntries := cs.createVolumeEntries(volumeList)
klog.V(4).Infof("ListVolumes: retrieved %d entries and %q next token from cloud %q", len(volumeEntries), nextPageToken, cloudsNames[idx])

cloudsToken.Token = nextPageToken
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cloudsToken.Token = nextPageToken

}

startIdx := 0
idx := 0
for _, cloudName := range cloudsNames {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why do we need to loop over cloud names? If you take a look at the ListSnapshots method, the cloud name is taken directly from the volCloud := req.GetSecrets()["cloud"]. Why not to do the same here? @MatthieuFin could you please clarify this logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kayrus I'm currently on vacation but I can give you some hint from my memories, usage of "secrets" with volCloud := req.GetSecrets()["cloud"] method is more a hack than a feature as explain on my PR, I got inspired by discussion in this issue where usage of secrets fields in gRPC csi spec to store current cloud.

So now that clear, we can simply use "secrets" field in gRPC call for ListSnapshots because csi spec provide a 'secrets" field in gRPC csi spec here that is fully convenient. This secret is filled by informations provided in StorageClass (I guess we list Snapshot of a SC or snapshot of a defined volume but never globally all snapshots)

But unfortunatelly (I don't remember why, but I guess that there is an explanation) in ListVolume call we implement gRPC call spec ListVolumesRequest where there is no secrets field but only max_entries and starting_token so as we don't have possibility to know which cloud is concerned (and that make sens we wanna list "k8s" volumes which concern all our clouds) we have to loop over each clouds, and I use "token" in JSON format (if I remember right) to store processed cloud and keep a "state" if ListVolume use pagination, in that way if a client provide a max_entries != 0 the token permit us to know which cloud we have to call to retrieve next volumes.

To be honest that not my proudest implementation (ListVolume function), I have maybe made some error, but I didn"t notice it for now, I'll take a look on this issue on my side when i'll back to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatthieuFin I think this is correct. There is no way to get the cloud info as there is no secrets field in the request - you simply need to list everything from your current configuration.

@kayrus The reason to loop over the cloud names is because we are creating a stable index of all the available clouds. Henceforth, we can know from the token which is the next cloud we need to list volumes for. I also don't see (at the moment at least) another way to know this info.

@MatthieuFin
Copy link
Contributor

I quickly read @kon-angelo's ListVolumes implementation and it's much simpler and readable, I focused on providing the requested max-entries but I agree with your approach, it seems more maintainable this way.

mc := metrics.NewMetricContext("volume", "list")
opts := volumes.ListOpts{Limit: limit, Marker: startingToken}
if limit == 0 {
page, err := volumes.List(os.blockstorage, opts).AllPages(context.Background())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the ctx

Suggested change
page, err := volumes.List(os.blockstorage, opts).AllPages(context.Background())
page, err := volumes.List(os.blockstorage, opts).AllPages(ctx)

)

type controllerServer struct {
Driver *Driver
Clouds map[string]openstack.IOpenStack
}

type cloudsStartingToken struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder whether it's better to use <token>:<cloud> format and use SplitN along with Join to avoid useless json encoding/decoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will provide a commit that changes the token to the colon-separated one. We don't care about any kind of backwards-compatibility here so it should be straightforward to change

@@ -33,11 +33,12 @@ import (
"google.golang.org/grpc/status"
"google.golang.org/protobuf/types/known/timestamppb"

"k8s.io/klog/v2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kindly ping

@kon-angelo kon-angelo force-pushed the refactor-list-volumes branch from 797a55b to 47aa102 Compare February 17, 2025 13:42
@kon-angelo kon-angelo force-pushed the refactor-list-volumes branch from 47aa102 to f99d400 Compare February 17, 2025 13:50
mc := metrics.NewMetricContext("volume", "list")
opts := volumes.ListOpts{Limit: limit, Marker: startingToken}
if limit == 0 {
page, err := volumes.List(os.blockstorage, opts).AllPages(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case there is no need to provide opts

Suggested change
page, err := volumes.List(os.blockstorage, opts).AllPages(ctx)
page, err := volumes.List(os.blockstorage, nil).AllPages(ctx)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be addressed now

@@ -32,7 +32,6 @@ import (
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
"google.golang.org/protobuf/types/known/timestamppb"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave it as is. this will help to avoid issues in backporting.

vols, err = volumes.ExtractVolumes(page)
if mc.ObserveRequest(err) != nil {
return vols, "", err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}
return vols, "", nil

Copy link
Contributor

@kayrus kayrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kon-angelo! Almost perfect. I just left some nit comments, and I hope we're ready to go.

cloud string
}{
{input: "foo:bar", token: "foo", cloud: "bar"},
{input: "foo", token: "foo", cloud: ""},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case

Suggested change
{input: "foo", token: "foo", cloud: ""},
{input: "foo", token: "foo", cloud: ""},
{input: "foo:", token: "foo", cloud: ""},

{input: "foo:bar", token: "foo", cloud: "bar"},
{input: "foo", token: "foo", cloud: ""},
{input: ":bar", token: "", cloud: "bar"},
{input: "foo:bar:baz", token: "foo", cloud: "bar:baz"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{input: "foo:bar:baz", token: "foo", cloud: "bar:baz"},
{input: "foo:bar:baz", token: "foo", cloud: "bar:baz"},
{input: "", token: "", cloud: ""},

NextToken: string(data),
}, nil
var volumeList []volumes.Volume
volumeList, token, err = cs.Clouds[cloudsNames[idx]].ListVolumes(ctx, maxEntries, token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can simply use cloudName here

Suggested change
volumeList, token, err = cs.Clouds[cloudsNames[idx]].ListVolumes(ctx, maxEntries, token)
volumeList, token, err = cs.Clouds[cloudName].ListVolumes(ctx, maxEntries, token)

return nil, status.Errorf(codes.Internal, "ListVolumes failed with error %v", err)
}
volumeEntries := cs.createVolumeEntries(volumeList)
klog.V(4).Infof("ListVolumes: retrieved %d entries and %q next token from cloud %q", len(volumeEntries), token, cloudsNames[idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here too

Suggested change
klog.V(4).Infof("ListVolumes: retrieved %d entries and %q next token from cloud %q", len(volumeEntries), token, cloudsNames[idx])
klog.V(4).Infof("ListVolumes: retrieved %d entries and %q next token from cloud %q", len(volumeEntries), token, cloudName)

@kayrus
Copy link
Contributor

kayrus commented Feb 19, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 19, 2025
@kayrus
Copy link
Contributor

kayrus commented Feb 19, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kayrus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 19, 2025
@kayrus
Copy link
Contributor

kayrus commented Feb 19, 2025

/cherry-pick release-1.31

@k8s-infra-cherrypick-robot

@kayrus: once the present PR merges, I will cherry-pick it on top of release-1.31 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.31

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kayrus
Copy link
Contributor

kayrus commented Feb 19, 2025

/test openstack-cloud-csi-cinder-e2e-test

@k8s-ci-robot k8s-ci-robot merged commit 29f3f0b into kubernetes:master Feb 19, 2025
5 of 6 checks passed
@k8s-infra-cherrypick-robot

@kayrus: #2766 failed to apply on top of branch "release-1.31":

Applying: refactor list volumes call
Using index info to reconstruct a base tree...
M	pkg/csi/cinder/controllerserver.go
M	pkg/csi/cinder/controllerserver_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/csi/cinder/controllerserver_test.go
CONFLICT (content): Merge conflict in pkg/csi/cinder/controllerserver_test.go
Auto-merging pkg/csi/cinder/controllerserver.go
CONFLICT (content): Merge conflict in pkg/csi/cinder/controllerserver.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 refactor list volumes call

In response to this:

/cherry-pick release-1.31

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kayrus pushed a commit to kayrus/cloud-provider-openstack that referenced this pull request Feb 20, 2025
* refactor list volumes call

* upgrade tests

* comments improvements

* fix imports and list options

* token split

* add more jointoken  tests
k8s-ci-robot pushed a commit that referenced this pull request Feb 20, 2025
* refactor list volumes call

* upgrade tests

* comments improvements

* fix imports and list options

* token split

* add more jointoken  tests

Co-authored-by: Konstantinos Angelopoulos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[cinder-csi-plugin] Cinder-csi-plugin does not respect pagination from Cinder API on list volumes call
5 participants