1
1
# Deployment
2
2
3
+ Authors:
4
+ - Brian Grant (@bgrant0607 )
5
+ - Clayton Coleman (@smarterclayton )
6
+ - Dan Mace (@ironcladlou )
7
+ - David Oppenheimer (@davidopp )
8
+ - Janet Kuo (@janetkuo )
9
+ - Michail Kargakis (@kargakis )
10
+ - Nikhil Jindal (@nikhiljindal )
11
+
3
12
## Abstract
4
13
5
14
A proposal for implementing a new resource - Deployment - which will enable
6
- declarative config updates for Pods and ReplicationControllers.
7
-
8
- Users will be able to create a Deployment, which will spin up
9
- a ReplicationController to bring up the desired pods.
10
- Users can also target the Deployment at existing ReplicationControllers, in
11
- which case the new RC will replace the existing ones. The exact mechanics of
12
- replacement depends on the DeploymentStrategy chosen by the user.
13
- DeploymentStrategies are explained in detail in a later section.
15
+ declarative config updates for ReplicaSets. Users will be able to create a
16
+ Deployment, which will spin up a ReplicaSet to bring up the desired Pods.
17
+ Users can also target the Deployment to an existing ReplicaSet either by
18
+ rolling back an existing Deployment or creating a new Deployment that can
19
+ adopt an existing ReplicaSet. The exact mechanics of replacement depends on
20
+ the DeploymentStrategy chosen by the user. DeploymentStrategies are explained
21
+ in detail in a later section.
14
22
15
23
## Implementation
16
24
@@ -33,10 +41,10 @@ type Deployment struct {
33
41
type DeploymentSpec struct {
34
42
// Number of desired pods. This is a pointer to distinguish between explicit
35
43
// zero and not specified. Defaults to 1.
36
- Replicas *int
44
+ Replicas *int32
37
45
38
- // Label selector for pods. Existing ReplicationControllers whose pods are
39
- // selected by this will be scaled down. New ReplicationControllers will be
46
+ // Label selector for pods. Existing ReplicaSets whose pods are
47
+ // selected by this will be scaled down. New ReplicaSets will be
40
48
// created with this selector, with a unique label `pod-template-hash`.
41
49
// If Selector is empty, it is defaulted to the labels present on the Pod template.
42
50
Selector map [string ]string
@@ -46,14 +54,17 @@ type DeploymentSpec struct {
46
54
47
55
// The deployment strategy to use to replace existing pods with new ones.
48
56
Strategy DeploymentStrategy
57
+
58
+ // Minimum number of seconds for which a newly created pod should be ready
59
+ // without any of its container crashing, for it to be considered available.
60
+ // Defaults to 0 (pod will be considered available as soon as it is ready)
61
+ MinReadySeconds int32
49
62
}
50
63
51
64
type DeploymentStrategy struct {
52
65
// Type of deployment. Can be "Recreate" or "RollingUpdate".
53
66
Type DeploymentStrategyType
54
67
55
- // TODO: Update this to follow our convention for oneOf, whatever we decide it
56
- // to be.
57
68
// Rolling update config params. Present only if DeploymentStrategyType =
58
69
// RollingUpdate.
59
70
RollingUpdate *RollingUpdateDeploymentStrategy
@@ -65,7 +76,8 @@ const (
65
76
// Kill all existing pods before creating new ones.
66
77
RecreateDeploymentStrategyType DeploymentStrategyType = " Recreate"
67
78
68
- // Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one.
79
+ // Replace the old ReplicaSets by new one using rolling update i.e gradually scale
80
+ // down the old ReplicaSets and scale up the new one.
69
81
RollingUpdateDeploymentStrategyType DeploymentStrategyType = " RollingUpdate"
70
82
)
71
83
@@ -94,20 +106,20 @@ type RollingUpdateDeploymentStrategy struct {
94
106
// new RC can be scaled up further, ensuring that total number of pods running
95
107
// at any time during the update is atmost 130% of original pods.
96
108
MaxSurge IntOrString
97
-
98
- // Minimum number of seconds for which a newly created pod should be ready
99
- // without any of its container crashing, for it to be considered available.
100
- // Defaults to 0 (pod will be considered available as soon as it is ready)
101
- MinReadySeconds int
102
109
}
103
110
104
111
type DeploymentStatus struct {
105
112
// Total number of ready pods targeted by this deployment (this
106
113
// includes both the old and new pods).
107
- Replicas int
114
+ Replicas int32
108
115
109
116
// Total number of new ready pods with the desired template spec.
110
- UpdatedReplicas int
117
+ UpdatedReplicas int32
118
+
119
+ // Monotonically increasing counter that tracks hash collisions for
120
+ // the Deployment. Used as a collision avoidance mechanism by the
121
+ // Deployment controller.
122
+ Uniquifier *int64
111
123
}
112
124
113
125
```
@@ -116,38 +128,42 @@ type DeploymentStatus struct {
116
128
117
129
#### Deployment Controller
118
130
119
- The DeploymentController will make Deployments happen.
120
- It will watch Deployment objects in etcd.
121
- For each pending deployment, it will:
131
+ The DeploymentController will process Deployments and crud ReplicaSets.
132
+ For each creation or update for a Deployment, it will:
122
133
123
- 1 . Find all RCs whose label selector is a superset of DeploymentSpec.Selector.
124
- - For now, we will do this in the client - list all RCs and then filter the
134
+ 1 . Find all RSs (ReplicaSets) whose label selector is a superset of DeploymentSpec.Selector.
135
+ - For now, we will do this in the client - list all RSs and then filter the
125
136
ones we want. Eventually, we want to expose this in the API.
126
- 2 . The new RC can have the same selector as the old RC and hence we add a unique
127
- selector to all these RCs (and the corresponding label to their pods) to ensure
128
- that they do not select the newly created pods (or old pods get selected by
129
- new RC ).
137
+ 2 . The new RS can have the same selector as the old RS and hence we add a unique
138
+ selector to all these RSs (and the corresponding label to their pods) to ensure
139
+ that they do not select the newly created pods (or old pods get selected by the
140
+ new RS ).
130
141
- The label key will be "pod-template-hash".
131
- - The label value will be hash of the podTemplateSpec for that RC without
132
- this label. This value will be unique for all RCs, since PodTemplateSpec should be unique.
133
- - If the RCs and pods dont already have this label and selector:
134
- - We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to
142
+ - The label value will be the hash of {podTemplateSpec+uniquifier} where podTemplateSpec
143
+ is the one that the new RS uses and uniquifier is a counter in the DeploymentStatus
144
+ that increments every time a [ hash collision] ( #hashing-collisions ) happens (hash
145
+ collisions should be rare with fnv).
146
+ - If the RSs and pods dont already have this label and selector:
147
+ - We will first add this to RS.PodTemplateSpec.Metadata.Labels for all RSs to
135
148
ensure that all new pods that they create will have this label.
136
- - Then we will add this label to their existing pods and then add this as a selector
137
- to that RC.
138
- 3 . Find if there exists an RC for which value of "pod-template-hash" label
149
+ - Then we will add this label to their existing pods
150
+ - Eventually we flip the RS selector to use the new label.
151
+ This process potentially can be abstracted to a new endpoint for controllers [ 1] .
152
+ 3 . Find if there exists an RS for which value of "pod-template-hash" label
139
153
is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then
140
- this is the RC that will be ramped up. If there is no such RC , then we create
154
+ this is the RS that will be ramped up. If there is no such RS , then we create
141
155
a new one using DeploymentSpec and then add a "pod-template-hash" label
142
- to it. RCSpec.replicas = 0 for a newly created RC.
143
- 4 . Scale up the new RC and scale down the olds ones as per the DeploymentStrategy.
144
- - Raise an event if we detect an error, like new pods failing to come up .
145
- 5 . Go back to step 1 unless the new RC has been ramped up to desired replicas
146
- and the old RCs have been ramped down to 0.
147
- 6 . Cleanup.
156
+ to it. The size of the new RS depends on the used DeploymentStrategyType
157
+ 4 . Scale up the new RS and scale down the olds ones as per the DeploymentStrategy.
158
+ Raise events appropriately (both in case of failure or success) .
159
+ 5 . Go back to step 1 unless the new RS has been ramped up to desired replicas
160
+ and the old RSs have been ramped down to 0.
161
+ 6 . Cleanup old RSs as per revisionHistoryLimit .
148
162
149
163
DeploymentController is stateless so that it can recover in case it crashes during a deployment.
150
164
165
+ [ 1] See https://github.com/kubernetes/kubernetes/issues/36897
166
+
151
167
### MinReadySeconds
152
168
153
169
We will implement MinReadySeconds using the Ready condition in Pod. We will add
@@ -163,52 +179,71 @@ LastTransitionTime to PodCondition.
163
179
164
180
### Updating
165
181
166
- Users can update an ongoing deployment before it is completed.
167
- In this case, the existing deployment will be stalled and the new one will
182
+ Users can update an ongoing Deployment before it is completed.
183
+ In this case, the existing rollout will be stalled and the new one will
168
184
begin.
169
- For ex: consider the following case:
170
- - User creates a deployment to rolling-update 10 pods with image: v1 to
185
+ For example, consider the following case:
186
+ - User updates a Deployment to rolling-update 10 pods with image: v1 to
171
187
pods with image: v2 .
172
- - User then updates this deployment to create pods with image: v3 ,
173
- when the image: v2 RC had been ramped up to 5 pods and the image: v1 RC
188
+ - User then updates this Deployment to create pods with image: v3 ,
189
+ when the image: v2 RS had been ramped up to 5 pods and the image: v1 RS
174
190
had been ramped down to 5 pods.
175
- - When Deployment Controller observes the new deployment , it will create
176
- a new RC for creating pods with image: v3 . It will then start ramping up this
177
- new RC to 10 pods and will ramp down both the existing RCs to 0.
191
+ - When Deployment Controller observes the new update , it will create
192
+ a new RS for creating pods with image: v3 . It will then start ramping up this
193
+ new RS to 10 pods and will ramp down both the existing RSs to 0.
178
194
179
195
### Deleting
180
196
181
- Users can pause/cancel a deployment by deleting it before it is completed.
182
- Recreating the same deployment will resume it.
183
- For ex: consider the following case:
184
- - User creates a deployment to rolling-update 10 pods with image: v1 to
185
- pods with image: v2 .
186
- - User then deletes this deployment while the old and new RCs are at 5 replicas each.
187
- User will end up with 2 RCs with 5 replicas each.
188
- User can then create the same deployment again in which case, DeploymentController will
189
- notice that the second RC exists already which it can ramp up while ramping down
197
+ Users can pause/cancel a rollout by doing a non-cascading deletion of the Deployment
198
+ before it is complete. Recreating the same Deployment will resume it.
199
+ For example, consider the following case:
200
+ - User creats a Deployment to perform a rolling-update for 10 pods from image: v1 to
201
+ image: v2 .
202
+ - User then deletes the Deployment while the old and new RSs are at 5 replicas each.
203
+ User will end up with 2 RSs with 5 replicas each.
204
+ User can then re- create the same Deployment again in which case, DeploymentController will
205
+ notice that the second RS exists already which it can ramp up while ramping down
190
206
the first one.
191
207
192
208
### Rollback
193
209
194
- We want to allow the user to rollback a deployment. To rollback a
195
- completed (or ongoing) deployment, user can create (or update) a deployment with
196
- DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec.
210
+ We want to allow the user to rollback a Deployment. To rollback a completed (or
211
+ ongoing) Deployment, users can simply use ` kubectl rollout undo ` or update the
212
+ Deployment directly by using its spec.rollbackTo.revision field and specify the
213
+ revision they want to rollback to or no revision which means that the Deployment
214
+ will be rolled back to its previous revision.
197
215
198
216
## Deployment Strategies
199
217
200
- DeploymentStrategy specifies how the new RC should replace existing RCs .
201
- To begin with, we will support 2 types of deployment :
202
- * Recreate: We kill all existing RCs and then bring up the new one. This results
203
- in quick deployment but there is a downtime when old pods are down but
218
+ DeploymentStrategy specifies how the new RS should replace existing RSs .
219
+ To begin with, we will support 2 types of Deployment :
220
+ * Recreate: We kill all existing RSs and then bring up the new one. This results
221
+ in quick Deployment but there is a downtime when old pods are down but
204
222
the new ones have not come up yet.
205
- * Rolling update: We gradually scale down old RCs while scaling up the new one.
206
- This results in a slower deployment, but there is no downtime. At all times
207
- during the deployment, there are a few pods available (old or new). The number
208
- of available pods and when is a pod considered "available" can be configured
209
- using RollingUpdateDeploymentStrategy.
210
-
211
- In future, we want to support more deployment types.
223
+ * Rolling update: We gradually scale down old RSs while scaling up the new one.
224
+ This results in a slower Deployment, but there can be no downtime. Depending on
225
+ the strategy parameters, it is possible to have at all times during the rollout
226
+ available pods (old or new). The number of available pods and when is a pod
227
+ considered "available" can be configured using RollingUpdateDeploymentStrategy.
228
+
229
+ ## Hashing collisions
230
+
231
+ Hashing collisions are a real thing with the existing hashing algorithm[ 1] . We
232
+ need to switch to a more stable algorithm like fnv. Preliminary benchmarks[ 2]
233
+ show that while fnv is a bit slower than adler, it is much more stable. Also,
234
+ hashing an API object is subject to API changes which means that the name
235
+ for a ReplicaSet may differ between minor Kubernetes versions.
236
+
237
+ For both of the aforementioned cases, we will use a field in the DeploymentStatus,
238
+ called Uniquifier, to create a unique hash value when a hash collision happens.
239
+ The Deployment controller will compute the hash value of {template+uniquifier},
240
+ and will use the resulting hash in the ReplicaSet names and selectors. One side
241
+ effect of this hash collision avoidance mechanism is that we don't need to
242
+ migrate ReplicaSets that were created with adler.
243
+
244
+ [ 1] https://github.com/kubernetes/kubernetes/issues/29735
245
+
246
+ [ 2] https://github.com/kubernetes/kubernetes/pull/39527
212
247
213
248
## Future
214
249
0 commit comments