Skip to content

Commit ff6c385

Browse files
author
Kubernetes Submit Queue
authored
Merge pull request kubernetes#44798 from zetaab/master
Automatic merge from submit-queue Statefulsets for cinder: allow multi-AZ deployments, spread pods across zones **What this PR does / why we need it**: Currently if we do not specify availability zone in cinder storageclass, the cinder is provisioned to zone called nova. However, like mentioned in issue, we have situation that we want spread statefulset across 3 different zones. Currently this is not possible with statefulsets and cinder storageclass. In this new solution, if we leave it empty the algorithm will choose the zone for the cinder drive similar style like in aws and gce storageclass solutions. **Which issue this PR fixes** fixes kubernetes#44735 **Special notes for your reviewer**: example: ``` kind: StorageClass apiVersion: storage.k8s.io/v1beta1 metadata: name: all provisioner: kubernetes.io/cinder --- apiVersion: v1 kind: Service metadata: annotations: service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" name: galera labels: app: mysql spec: ports: - port: 3306 name: mysql clusterIP: None selector: app: mysql --- apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: mysql spec: serviceName: "galera" replicas: 3 template: metadata: labels: app: mysql annotations: pod.alpha.kubernetes.io/initialized: "true" spec: containers: - name: mysql image: adfinissygroup/k8s-mariadb-galera-centos:v002 imagePullPolicy: Always ports: - containerPort: 3306 name: mysql - containerPort: 4444 name: sst - containerPort: 4567 name: replication - containerPort: 4568 name: ist volumeMounts: - name: storage mountPath: /data readinessProbe: exec: command: - /usr/share/container-scripts/mysql/readiness-probe.sh initialDelaySeconds: 15 timeoutSeconds: 5 env: - name: POD_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace volumeClaimTemplates: - metadata: name: storage annotations: volume.beta.kubernetes.io/storage-class: all spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 12Gi ``` If this example is deployed it will automatically create one replica per AZ. This helps us a lot making HA databases. Current storageclass for cinder is not perfect in case of statefulsets. Lets assume that cinder storageclass is defined to be in zone called nova, but because labels are not added to pv - pods can be started in any zone. The problem is that at least in our openstack it is not possible to use cinder drive located in zone x from zone y. However, should we have possibility to choose between cross-zone cinder mounts or not? Imo it is not good way of doing things that they mount volume from another zone where the pod is located(means more network traffic between zones)? What you think? Current new solution does not allow that anymore (should we have possibility to allow it? it means removing the labels from pv). There might be some things that needs to be fixed still in this release and I need help for that. Some parts of the code is not perfect. Issues what i am thinking about (I need some help for these): 1) Can everybody see in openstack what AZ their servers are? Can there be like access policy that do not show that? If AZ is not found from server specs, I have no idea how the code behaves. 2) In GetAllZones() function, is it really needed to make new serviceclient using openstack.NewComputeV2 or could I somehow use existing one 3) This fetches all servers from some openstack tenant(project). However, in some cases kubernetes is maybe deployed only to specific zone. If kube servers are located for instance in zone 1, and then there are another servers in same tenant in zone 2. There might be usecase that cinder drive is provisioned to zone-2 but it cannot start pod, because kubernetes does not have any nodes in zone-2. Could we have better way to fetch kubernetes nodes zones? Currently that information is not added to kubernetes node labels automatically in openstack (which should I think). I have added those labels manually to nodes. If that zone information is not added to nodes, the new solution does not start stateful pods at all, because it cannot target pods. cc @rootfs @anguslees @jsafrane ```release-note Default behaviour in cinder storageclass is changed. If availability is not specified, the zone is chosen by algorithm. It makes possible to spread stateful pods across many zones. ```
2 parents 7cdbf91 + 7bdee3e commit ff6c385

File tree

7 files changed

+75
-32
lines changed

7 files changed

+75
-32
lines changed

pkg/cloudprovider/providers/openstack/openstack_test.go

+1-1
Original file line numberDiff line numberDiff line change
@@ -370,7 +370,7 @@ func TestVolumes(t *testing.T) {
370370
tags := map[string]string{
371371
"test": "value",
372372
}
373-
vol, err := os.CreateVolume("kubernetes-test-volume-"+rand.String(10), 1, "", "", &tags)
373+
vol, _, err := os.CreateVolume("kubernetes-test-volume-"+rand.String(10), 1, "", "", &tags)
374374
if err != nil {
375375
t.Fatalf("Cannot create a new Cinder volume: %v", err)
376376
}

pkg/cloudprovider/providers/openstack/openstack_volumes.go

+13-13
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ import (
3636
)
3737

3838
type volumeService interface {
39-
createVolume(opts VolumeCreateOpts) (string, error)
39+
createVolume(opts VolumeCreateOpts) (string, string, error)
4040
getVolume(diskName string) (Volume, error)
4141
deleteVolume(volumeName string) error
4242
}
@@ -74,7 +74,7 @@ type VolumeCreateOpts struct {
7474
Metadata map[string]string
7575
}
7676

77-
func (volumes *VolumesV1) createVolume(opts VolumeCreateOpts) (string, error) {
77+
func (volumes *VolumesV1) createVolume(opts VolumeCreateOpts) (string, string, error) {
7878

7979
create_opts := volumes_v1.CreateOpts{
8080
Name: opts.Name,
@@ -86,12 +86,12 @@ func (volumes *VolumesV1) createVolume(opts VolumeCreateOpts) (string, error) {
8686

8787
vol, err := volumes_v1.Create(volumes.blockstorage, create_opts).Extract()
8888
if err != nil {
89-
return "", err
89+
return "", "", err
9090
}
91-
return vol.ID, nil
91+
return vol.ID, vol.AvailabilityZone, nil
9292
}
9393

94-
func (volumes *VolumesV2) createVolume(opts VolumeCreateOpts) (string, error) {
94+
func (volumes *VolumesV2) createVolume(opts VolumeCreateOpts) (string, string, error) {
9595

9696
create_opts := volumes_v2.CreateOpts{
9797
Name: opts.Name,
@@ -103,9 +103,9 @@ func (volumes *VolumesV2) createVolume(opts VolumeCreateOpts) (string, error) {
103103

104104
vol, err := volumes_v2.Create(volumes.blockstorage, create_opts).Extract()
105105
if err != nil {
106-
return "", err
106+
return "", "", err
107107
}
108-
return vol.ID, nil
108+
return vol.ID, vol.AvailabilityZone, nil
109109
}
110110

111111
func (volumes *VolumesV1) getVolume(diskName string) (Volume, error) {
@@ -283,12 +283,12 @@ func (os *OpenStack) getVolume(diskName string) (Volume, error) {
283283
}
284284

285285
// Create a volume of given size (in GiB)
286-
func (os *OpenStack) CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (volumeName string, err error) {
286+
func (os *OpenStack) CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (string, string, error) {
287287

288288
volumes, err := os.volumeService("")
289289
if err != nil || volumes == nil {
290290
glog.Errorf("Unable to initialize cinder client for region: %s", os.region)
291-
return "", err
291+
return "", "", err
292292
}
293293
opts := VolumeCreateOpts{
294294
Name: name,
@@ -299,15 +299,15 @@ func (os *OpenStack) CreateVolume(name string, size int, vtype, availability str
299299
if tags != nil {
300300
opts.Metadata = *tags
301301
}
302-
volume_id, err := volumes.createVolume(opts)
302+
volumeId, volumeAZ, err := volumes.createVolume(opts)
303303

304304
if err != nil {
305305
glog.Errorf("Failed to create a %d GB volume: %v", size, err)
306-
return "", err
306+
return "", "", err
307307
}
308308

309-
glog.Infof("Created volume %v", volume_id)
310-
return volume_id, nil
309+
glog.Infof("Created volume %v in Availability Zone: %v", volumeId, volumeAZ)
310+
return volumeId, volumeAZ, nil
311311
}
312312

313313
// GetDevicePath returns the path of an attached block storage volume, specified by its id.

pkg/volume/cinder/BUILD

+2
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ go_library(
1919
tags = ["automanaged"],
2020
deps = [
2121
"//pkg/api/v1:go_default_library",
22+
"//pkg/client/clientset_generated/clientset:go_default_library",
2223
"//pkg/cloudprovider:go_default_library",
2324
"//pkg/cloudprovider/providers/openstack:go_default_library",
2425
"//pkg/cloudprovider/providers/rackspace:go_default_library",
@@ -32,6 +33,7 @@ go_library(
3233
"//vendor/k8s.io/apimachinery/pkg/api/resource:go_default_library",
3334
"//vendor/k8s.io/apimachinery/pkg/apis/meta/v1:go_default_library",
3435
"//vendor/k8s.io/apimachinery/pkg/types:go_default_library",
36+
"//vendor/k8s.io/apimachinery/pkg/util/sets:go_default_library",
3537
],
3638
)
3739

pkg/volume/cinder/attacher_test.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -506,8 +506,8 @@ func (testcase *testcase) ShouldTrustDevicePath() bool {
506506
return true
507507
}
508508

509-
func (testcase *testcase) CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (volumeName string, err error) {
510-
return "", errors.New("Not implemented")
509+
func (testcase *testcase) CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (volumeId string, volumeAZ string, err error) {
510+
return "", "", errors.New("Not implemented")
511511
}
512512

513513
func (testcase *testcase) GetDevicePath(diskId string) string {

pkg/volume/cinder/cinder.go

+4-4
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ type CinderProvider interface {
4747
AttachDisk(instanceID string, diskName string) (string, error)
4848
DetachDisk(instanceID string, partialDiskId string) error
4949
DeleteVolume(volumeName string) error
50-
CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (volumeName string, err error)
50+
CreateVolume(name string, size int, vtype, availability string, tags *map[string]string) (string, string, error)
5151
GetDevicePath(diskId string) string
5252
InstanceID() (string, error)
5353
GetAttachmentDiskPath(instanceID string, diskName string) (string, error)
@@ -239,7 +239,7 @@ type cdManager interface {
239239
// Detaches the disk from the kubelet's host machine.
240240
DetachDisk(unmounter *cinderVolumeUnmounter) error
241241
// Creates a volume
242-
CreateVolume(provisioner *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, err error)
242+
CreateVolume(provisioner *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, labels map[string]string, err error)
243243
// Deletes a volume
244244
DeleteVolume(deleter *cinderVolumeDeleter) error
245245
}
@@ -482,15 +482,15 @@ type cinderVolumeProvisioner struct {
482482
var _ volume.Provisioner = &cinderVolumeProvisioner{}
483483

484484
func (c *cinderVolumeProvisioner) Provision() (*v1.PersistentVolume, error) {
485-
volumeID, sizeGB, err := c.manager.CreateVolume(c)
485+
volumeID, sizeGB, labels, err := c.manager.CreateVolume(c)
486486
if err != nil {
487487
return nil, err
488488
}
489489

490490
pv := &v1.PersistentVolume{
491491
ObjectMeta: metav1.ObjectMeta{
492492
Name: c.options.PVName,
493-
Labels: map[string]string{},
493+
Labels: labels,
494494
Annotations: map[string]string{
495495
"kubernetes.io/createdby": "cinder-dynamic-provisioner",
496496
},

pkg/volume/cinder/cinder_test.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,8 @@ func (fake *fakePDManager) DetachDisk(c *cinderVolumeUnmounter) error {
116116
return nil
117117
}
118118

119-
func (fake *fakePDManager) CreateVolume(c *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, err error) {
120-
return "test-volume-name", 1, nil
119+
func (fake *fakePDManager) CreateVolume(c *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, labels map[string]string, err error) {
120+
return "test-volume-name", 1, nil, nil
121121
}
122122

123123
func (fake *fakePDManager) DeleteVolume(cd *cinderVolumeDeleter) error {

pkg/volume/cinder/cinder_util.go

+51-10
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,11 @@ import (
2525
"time"
2626

2727
"github.com/golang/glog"
28+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
29+
30+
"k8s.io/apimachinery/pkg/util/sets"
2831
"k8s.io/kubernetes/pkg/api/v1"
32+
"k8s.io/kubernetes/pkg/client/clientset_generated/clientset"
2933
"k8s.io/kubernetes/pkg/util/exec"
3034
"k8s.io/kubernetes/pkg/volume"
3135
)
@@ -135,10 +139,28 @@ func (util *CinderDiskUtil) DeleteVolume(cd *cinderVolumeDeleter) error {
135139
return nil
136140
}
137141

138-
func (util *CinderDiskUtil) CreateVolume(c *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, err error) {
142+
func getZonesFromNodes(kubeClient clientset.Interface) (sets.String, error) {
143+
// TODO: caching, currently it is overkill because it calls this function
144+
// only when it creates dynamic PV
145+
zones := make(sets.String)
146+
nodes, err := kubeClient.Core().Nodes().List(metav1.ListOptions{})
147+
if err != nil {
148+
glog.V(2).Infof("Error listing nodes")
149+
return zones, err
150+
}
151+
for _, node := range nodes.Items {
152+
if zone, ok := node.Labels[metav1.LabelZoneFailureDomain]; ok {
153+
zones.Insert(zone)
154+
}
155+
}
156+
glog.V(4).Infof("zones found: %v", zones)
157+
return zones, nil
158+
}
159+
160+
func (util *CinderDiskUtil) CreateVolume(c *cinderVolumeProvisioner) (volumeID string, volumeSizeGB int, volumeLabels map[string]string, err error) {
139161
cloud, err := c.plugin.getCloudProvider()
140162
if err != nil {
141-
return "", 0, err
163+
return "", 0, nil, err
142164
}
143165

144166
capacity := c.options.PVC.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)]
@@ -157,21 +179,40 @@ func (util *CinderDiskUtil) CreateVolume(c *cinderVolumeProvisioner) (volumeID s
157179
case "availability":
158180
availability = v
159181
default:
160-
return "", 0, fmt.Errorf("invalid option %q for volume plugin %s", k, c.plugin.GetPluginName())
182+
return "", 0, nil, fmt.Errorf("invalid option %q for volume plugin %s", k, c.plugin.GetPluginName())
161183
}
162184
}
163185
// TODO: implement PVC.Selector parsing
164186
if c.options.PVC.Spec.Selector != nil {
165-
return "", 0, fmt.Errorf("claim.Spec.Selector is not supported for dynamic provisioning on Cinder")
187+
return "", 0, nil, fmt.Errorf("claim.Spec.Selector is not supported for dynamic provisioning on Cinder")
166188
}
167189

168-
name, err = cloud.CreateVolume(name, volSizeGB, vtype, availability, c.options.CloudTags)
169-
if err != nil {
170-
glog.V(2).Infof("Error creating cinder volume: %v", err)
171-
return "", 0, err
190+
if availability == "" {
191+
// No zone specified, choose one randomly in the same region
192+
zones, err := getZonesFromNodes(c.plugin.host.GetKubeClient())
193+
if err != nil {
194+
glog.V(2).Infof("error getting zone information: %v", err)
195+
return "", 0, nil, err
196+
}
197+
// if we did not get any zones, lets leave it blank and gophercloud will
198+
// use zone "nova" as default
199+
if len(zones) > 0 {
200+
availability = volume.ChooseZoneForVolume(zones, c.options.PVC.Name)
201+
}
172202
}
173-
glog.V(2).Infof("Successfully created cinder volume %s", name)
174-
return name, volSizeGB, nil
203+
204+
volumeId, volumeAZ, errr := cloud.CreateVolume(name, volSizeGB, vtype, availability, c.options.CloudTags)
205+
if errr != nil {
206+
glog.V(2).Infof("Error creating cinder volume: %v", errr)
207+
return "", 0, nil, errr
208+
}
209+
glog.V(2).Infof("Successfully created cinder volume %s", volumeId)
210+
211+
// these are needed that pod is spawning to same AZ
212+
volumeLabels = make(map[string]string)
213+
volumeLabels[metav1.LabelZoneFailureDomain] = volumeAZ
214+
215+
return volumeId, volSizeGB, volumeLabels, nil
175216
}
176217

177218
func probeAttachedVolume() error {

0 commit comments

Comments
 (0)