-
Notifications
You must be signed in to change notification settings - Fork 177
Replace k8s client get requests with volume cache #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These Gets intentionally bypass the cache to avoid duplicating subsequent Create/Delete calls. In terms of apiserver load/throttling, it seems obvious that they should be removed. Keep the Get call: the controller will always call The reason these Gets are here is to mimic the in-tree controller: https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/persistentvolume/pv_controller.go#L1214 @jsafrane do you think we still need these Get calls? IMO no. |
Also, I don't think the 'waiting for locks' scenario is even possible anymore ever since we moved to a workqueue model |
The comment about locks is probably obsolete. Still, direct
If you make sure that the provisioner has cache composed both from the informer / API server watch and list of recently provisioned volumes, then yes, you can remove the GET call. I am not sure how safe it is to adds items directly to informer's Store ( |
controller/controller.go
Outdated
@@ -187,6 +188,8 @@ type ProvisionController struct { | |||
claimsInProgress sync.Map | |||
|
|||
volumeStore VolumeStore | |||
|
|||
volumeLister corelisters.PersistentVolumeLister |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why volumeLister
? There is volumeInformer
and volumes
already available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like i totally missed volumes
. I've only tested this with an explicit lister so will try that out and update the commit.
@jsafrane @wongma7 does CSI spec dictating that |
Yes, CSI should be idempotent, however, we should not exercise it too often as calling CreateVolume may be an expensive operation for some storage backends (far more expensive than Kubernetes API call). Did you check if it's possible to add newly provisioned PV into |
@@ -1449,20 +1454,7 @@ func (ctrl *ProvisionController) deleteVolumeOperation(ctx context.Context, volu | |||
operation := fmt.Sprintf("delete %q", volume.Name) | |||
glog.Info(logOperation(operation, "started")) | |||
|
|||
// This method may have been waiting for a volume lock for some time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this due to the earlier discussion on locks. deleteVolumeOperation
takes the volume
object as an input parameter
@jsafrane i've changed the PR to use the volumes cache instead of lister, please take a look! |
controller/controller.go
Outdated
@@ -705,7 +705,13 @@ func NewProvisionController( | |||
// PersistentVolumes | |||
|
|||
volumeHandler := cache.ResourceEventHandlerFuncs{ | |||
AddFunc: func(obj interface{}) { controller.enqueueVolume(obj) }, | |||
AddFunc: func(obj interface{}) { | |||
if err = controller.volumes.Add(obj); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The volume coming from informer already is in the informer cache, you don't need to add it again.
What you need to do is to add the volume to the store after the provisioner saved the volume to the API server here
klog.V(5).Infof("Volume %s saved", volume.Name) |
and here
// Save succeeded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually moved it here https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1440
because 'volumes' is a member of 'ProvisionController`
4a08db9
to
6fd1e9f
Compare
That's weird, the tests are passing locally:
|
/retest |
I'm afraid the unit test error is genuine - please check what did you change in the provisioner metrics. |
@jsafrane actually i think it may just be a flaky test!
|
/retest |
controller/controller.go
Outdated
@@ -1437,6 +1437,9 @@ func (ctrl *ProvisionController) provisionClaimOperation(ctx context.Context, cl | |||
glog.Info(logOperation(operation, "succeeded")) | |||
|
|||
if err := ctrl.volumeStore.StoreVolume(claim, volume); err != nil { | |||
if volumAddErr := ctrl.volumes.Add(volume.Name); volumAddErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add the volume to cache on StoreVolume success, not after failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! I have no idea how it got in there
/test pull-sig-storage-lib-external-provisioner-unit |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, RaunakShah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -1439,6 +1439,9 @@ func (ctrl *ProvisionController) provisionClaimOperation(ctx context.Context, cl | |||
if err := ctrl.volumeStore.StoreVolume(claim, volume); err != nil { | |||
return ProvisioningFinished, err | |||
} | |||
if err = ctrl.volumes.Add(volume.Name); err != nil { | |||
utilruntime.HandleError(err) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Add
fails all the time with:
couldn't create key for object pvc-17e4c2d0-114d-4260-b587-c39a86c8dfe5: object has no meta: object does not implement the Object interfaces
You need to add the volume
object, not its name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix is in #92
@@ -1523,6 +1517,9 @@ func (ctrl *ProvisionController) deleteVolumeOperation(ctx context.Context, volu | |||
|
|||
glog.Info(logOperation(operation, "persistentvolume deleted")) | |||
|
|||
if err = ctrl.volumes.Delete(volume.Name); err != nil { | |||
utilruntime.HandleError(err) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
This change replaces API server
Get()
requests with access to the volume cache.Addresses #82
Testing:
~2x increase in throughput of external-provisioner. "Throttling ..." messages due to lib-external-provisioner no longer seen due to
Get()
requests to API server.With a dummy CSI controller that returns success for every
CreateVolume
andDeleteVolume
callUse 64-threads to create and delete PVCs in a loop for 5 minutes. Tracking the number of operations successfully executed within this time helps calculate the throughput.
Without this change:
Operations completed - 728
Ops/sec - 2.435
With this change:
Operations completed - 1280
Ops/sec - 4.281