Skip to content

Commit 66005e4

Browse files
authored
Merge pull request kubernetes#477 from kargakis/alternative-deployment-proposal
Refine the Deployment proposal and switch hashing algorithm
2 parents 66ec1a7 + c2dc58d commit 66005e4

File tree

1 file changed

+111
-76
lines changed

1 file changed

+111
-76
lines changed

deployment.md

Lines changed: 111 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,24 @@
11
# Deployment
22

3+
Authors:
4+
- Brian Grant (@bgrant0607)
5+
- Clayton Coleman (@smarterclayton)
6+
- Dan Mace (@ironcladlou)
7+
- David Oppenheimer (@davidopp)
8+
- Janet Kuo (@janetkuo)
9+
- Michail Kargakis (@kargakis)
10+
- Nikhil Jindal (@nikhiljindal)
11+
312
## Abstract
413

514
A proposal for implementing a new resource - Deployment - which will enable
6-
declarative config updates for Pods and ReplicationControllers.
7-
8-
Users will be able to create a Deployment, which will spin up
9-
a ReplicationController to bring up the desired pods.
10-
Users can also target the Deployment at existing ReplicationControllers, in
11-
which case the new RC will replace the existing ones. The exact mechanics of
12-
replacement depends on the DeploymentStrategy chosen by the user.
13-
DeploymentStrategies are explained in detail in a later section.
15+
declarative config updates for ReplicaSets. Users will be able to create a
16+
Deployment, which will spin up a ReplicaSet to bring up the desired Pods.
17+
Users can also target the Deployment to an existing ReplicaSet either by
18+
rolling back an existing Deployment or creating a new Deployment that can
19+
adopt an existing ReplicaSet. The exact mechanics of replacement depends on
20+
the DeploymentStrategy chosen by the user. DeploymentStrategies are explained
21+
in detail in a later section.
1422

1523
## Implementation
1624

@@ -33,10 +41,10 @@ type Deployment struct {
3341
type DeploymentSpec struct {
3442
// Number of desired pods. This is a pointer to distinguish between explicit
3543
// zero and not specified. Defaults to 1.
36-
Replicas *int
44+
Replicas *int32
3745

38-
// Label selector for pods. Existing ReplicationControllers whose pods are
39-
// selected by this will be scaled down. New ReplicationControllers will be
46+
// Label selector for pods. Existing ReplicaSets whose pods are
47+
// selected by this will be scaled down. New ReplicaSets will be
4048
// created with this selector, with a unique label `pod-template-hash`.
4149
// If Selector is empty, it is defaulted to the labels present on the Pod template.
4250
Selector map[string]string
@@ -46,14 +54,17 @@ type DeploymentSpec struct {
4654

4755
// The deployment strategy to use to replace existing pods with new ones.
4856
Strategy DeploymentStrategy
57+
58+
// Minimum number of seconds for which a newly created pod should be ready
59+
// without any of its container crashing, for it to be considered available.
60+
// Defaults to 0 (pod will be considered available as soon as it is ready)
61+
MinReadySeconds int32
4962
}
5063

5164
type DeploymentStrategy struct {
5265
// Type of deployment. Can be "Recreate" or "RollingUpdate".
5366
Type DeploymentStrategyType
5467

55-
// TODO: Update this to follow our convention for oneOf, whatever we decide it
56-
// to be.
5768
// Rolling update config params. Present only if DeploymentStrategyType =
5869
// RollingUpdate.
5970
RollingUpdate *RollingUpdateDeploymentStrategy
@@ -65,7 +76,8 @@ const (
6576
// Kill all existing pods before creating new ones.
6677
RecreateDeploymentStrategyType DeploymentStrategyType = "Recreate"
6778

68-
// Replace the old RCs by new one using rolling update i.e gradually scale down the old RCs and scale up the new one.
79+
// Replace the old ReplicaSets by new one using rolling update i.e gradually scale
80+
// down the old ReplicaSets and scale up the new one.
6981
RollingUpdateDeploymentStrategyType DeploymentStrategyType = "RollingUpdate"
7082
)
7183

@@ -94,20 +106,20 @@ type RollingUpdateDeploymentStrategy struct {
94106
// new RC can be scaled up further, ensuring that total number of pods running
95107
// at any time during the update is atmost 130% of original pods.
96108
MaxSurge IntOrString
97-
98-
// Minimum number of seconds for which a newly created pod should be ready
99-
// without any of its container crashing, for it to be considered available.
100-
// Defaults to 0 (pod will be considered available as soon as it is ready)
101-
MinReadySeconds int
102109
}
103110

104111
type DeploymentStatus struct {
105112
// Total number of ready pods targeted by this deployment (this
106113
// includes both the old and new pods).
107-
Replicas int
114+
Replicas int32
108115

109116
// Total number of new ready pods with the desired template spec.
110-
UpdatedReplicas int
117+
UpdatedReplicas int32
118+
119+
// Monotonically increasing counter that tracks hash collisions for
120+
// the Deployment. Used as a collision avoidance mechanism by the
121+
// Deployment controller.
122+
Uniquifier *int64
111123
}
112124

113125
```
@@ -116,38 +128,42 @@ type DeploymentStatus struct {
116128

117129
#### Deployment Controller
118130

119-
The DeploymentController will make Deployments happen.
120-
It will watch Deployment objects in etcd.
121-
For each pending deployment, it will:
131+
The DeploymentController will process Deployments and crud ReplicaSets.
132+
For each creation or update for a Deployment, it will:
122133

123-
1. Find all RCs whose label selector is a superset of DeploymentSpec.Selector.
124-
- For now, we will do this in the client - list all RCs and then filter the
134+
1. Find all RSs (ReplicaSets) whose label selector is a superset of DeploymentSpec.Selector.
135+
- For now, we will do this in the client - list all RSs and then filter the
125136
ones we want. Eventually, we want to expose this in the API.
126-
2. The new RC can have the same selector as the old RC and hence we add a unique
127-
selector to all these RCs (and the corresponding label to their pods) to ensure
128-
that they do not select the newly created pods (or old pods get selected by
129-
new RC).
137+
2. The new RS can have the same selector as the old RS and hence we add a unique
138+
selector to all these RSs (and the corresponding label to their pods) to ensure
139+
that they do not select the newly created pods (or old pods get selected by the
140+
new RS).
130141
- The label key will be "pod-template-hash".
131-
- The label value will be hash of the podTemplateSpec for that RC without
132-
this label. This value will be unique for all RCs, since PodTemplateSpec should be unique.
133-
- If the RCs and pods dont already have this label and selector:
134-
- We will first add this to RC.PodTemplateSpec.Metadata.Labels for all RCs to
142+
- The label value will be the hash of {podTemplateSpec+uniquifier} where podTemplateSpec
143+
is the one that the new RS uses and uniquifier is a counter in the DeploymentStatus
144+
that increments every time a [hash collision](#hashing-collisions) happens (hash
145+
collisions should be rare with fnv).
146+
- If the RSs and pods dont already have this label and selector:
147+
- We will first add this to RS.PodTemplateSpec.Metadata.Labels for all RSs to
135148
ensure that all new pods that they create will have this label.
136-
- Then we will add this label to their existing pods and then add this as a selector
137-
to that RC.
138-
3. Find if there exists an RC for which value of "pod-template-hash" label
149+
- Then we will add this label to their existing pods
150+
- Eventually we flip the RS selector to use the new label.
151+
This process potentially can be abstracted to a new endpoint for controllers [1].
152+
3. Find if there exists an RS for which value of "pod-template-hash" label
139153
is same as hash of DeploymentSpec.PodTemplateSpec. If it exists already, then
140-
this is the RC that will be ramped up. If there is no such RC, then we create
154+
this is the RS that will be ramped up. If there is no such RS, then we create
141155
a new one using DeploymentSpec and then add a "pod-template-hash" label
142-
to it. RCSpec.replicas = 0 for a newly created RC.
143-
4. Scale up the new RC and scale down the olds ones as per the DeploymentStrategy.
144-
- Raise an event if we detect an error, like new pods failing to come up.
145-
5. Go back to step 1 unless the new RC has been ramped up to desired replicas
146-
and the old RCs have been ramped down to 0.
147-
6. Cleanup.
156+
to it. The size of the new RS depends on the used DeploymentStrategyType
157+
4. Scale up the new RS and scale down the olds ones as per the DeploymentStrategy.
158+
Raise events appropriately (both in case of failure or success).
159+
5. Go back to step 1 unless the new RS has been ramped up to desired replicas
160+
and the old RSs have been ramped down to 0.
161+
6. Cleanup old RSs as per revisionHistoryLimit.
148162

149163
DeploymentController is stateless so that it can recover in case it crashes during a deployment.
150164

165+
[1] See https://github.com/kubernetes/kubernetes/issues/36897
166+
151167
### MinReadySeconds
152168

153169
We will implement MinReadySeconds using the Ready condition in Pod. We will add
@@ -163,52 +179,71 @@ LastTransitionTime to PodCondition.
163179

164180
### Updating
165181

166-
Users can update an ongoing deployment before it is completed.
167-
In this case, the existing deployment will be stalled and the new one will
182+
Users can update an ongoing Deployment before it is completed.
183+
In this case, the existing rollout will be stalled and the new one will
168184
begin.
169-
For ex: consider the following case:
170-
- User creates a deployment to rolling-update 10 pods with image:v1 to
185+
For example, consider the following case:
186+
- User updates a Deployment to rolling-update 10 pods with image:v1 to
171187
pods with image:v2.
172-
- User then updates this deployment to create pods with image:v3,
173-
when the image:v2 RC had been ramped up to 5 pods and the image:v1 RC
188+
- User then updates this Deployment to create pods with image:v3,
189+
when the image:v2 RS had been ramped up to 5 pods and the image:v1 RS
174190
had been ramped down to 5 pods.
175-
- When Deployment Controller observes the new deployment, it will create
176-
a new RC for creating pods with image:v3. It will then start ramping up this
177-
new RC to 10 pods and will ramp down both the existing RCs to 0.
191+
- When Deployment Controller observes the new update, it will create
192+
a new RS for creating pods with image:v3. It will then start ramping up this
193+
new RS to 10 pods and will ramp down both the existing RSs to 0.
178194

179195
### Deleting
180196

181-
Users can pause/cancel a deployment by deleting it before it is completed.
182-
Recreating the same deployment will resume it.
183-
For ex: consider the following case:
184-
- User creates a deployment to rolling-update 10 pods with image:v1 to
185-
pods with image:v2.
186-
- User then deletes this deployment while the old and new RCs are at 5 replicas each.
187-
User will end up with 2 RCs with 5 replicas each.
188-
User can then create the same deployment again in which case, DeploymentController will
189-
notice that the second RC exists already which it can ramp up while ramping down
197+
Users can pause/cancel a rollout by doing a non-cascading deletion of the Deployment
198+
before it is complete. Recreating the same Deployment will resume it.
199+
For example, consider the following case:
200+
- User creats a Deployment to perform a rolling-update for 10 pods from image:v1 to
201+
image:v2.
202+
- User then deletes the Deployment while the old and new RSs are at 5 replicas each.
203+
User will end up with 2 RSs with 5 replicas each.
204+
User can then re-create the same Deployment again in which case, DeploymentController will
205+
notice that the second RS exists already which it can ramp up while ramping down
190206
the first one.
191207

192208
### Rollback
193209

194-
We want to allow the user to rollback a deployment. To rollback a
195-
completed (or ongoing) deployment, user can create (or update) a deployment with
196-
DeploymentSpec.PodTemplateSpec = oldRC.PodTemplateSpec.
210+
We want to allow the user to rollback a Deployment. To rollback a completed (or
211+
ongoing) Deployment, users can simply use `kubectl rollout undo` or update the
212+
Deployment directly by using its spec.rollbackTo.revision field and specify the
213+
revision they want to rollback to or no revision which means that the Deployment
214+
will be rolled back to its previous revision.
197215

198216
## Deployment Strategies
199217

200-
DeploymentStrategy specifies how the new RC should replace existing RCs.
201-
To begin with, we will support 2 types of deployment:
202-
* Recreate: We kill all existing RCs and then bring up the new one. This results
203-
in quick deployment but there is a downtime when old pods are down but
218+
DeploymentStrategy specifies how the new RS should replace existing RSs.
219+
To begin with, we will support 2 types of Deployment:
220+
* Recreate: We kill all existing RSs and then bring up the new one. This results
221+
in quick Deployment but there is a downtime when old pods are down but
204222
the new ones have not come up yet.
205-
* Rolling update: We gradually scale down old RCs while scaling up the new one.
206-
This results in a slower deployment, but there is no downtime. At all times
207-
during the deployment, there are a few pods available (old or new). The number
208-
of available pods and when is a pod considered "available" can be configured
209-
using RollingUpdateDeploymentStrategy.
210-
211-
In future, we want to support more deployment types.
223+
* Rolling update: We gradually scale down old RSs while scaling up the new one.
224+
This results in a slower Deployment, but there can be no downtime. Depending on
225+
the strategy parameters, it is possible to have at all times during the rollout
226+
available pods (old or new). The number of available pods and when is a pod
227+
considered "available" can be configured using RollingUpdateDeploymentStrategy.
228+
229+
## Hashing collisions
230+
231+
Hashing collisions are a real thing with the existing hashing algorithm[1]. We
232+
need to switch to a more stable algorithm like fnv. Preliminary benchmarks[2]
233+
show that while fnv is a bit slower than adler, it is much more stable. Also,
234+
hashing an API object is subject to API changes which means that the name
235+
for a ReplicaSet may differ between minor Kubernetes versions.
236+
237+
For both of the aforementioned cases, we will use a field in the DeploymentStatus,
238+
called Uniquifier, to create a unique hash value when a hash collision happens.
239+
The Deployment controller will compute the hash value of {template+uniquifier},
240+
and will use the resulting hash in the ReplicaSet names and selectors. One side
241+
effect of this hash collision avoidance mechanism is that we don't need to
242+
migrate ReplicaSets that were created with adler.
243+
244+
[1] https://github.com/kubernetes/kubernetes/issues/29735
245+
246+
[2] https://github.com/kubernetes/kubernetes/pull/39527
212247

213248
## Future
214249

0 commit comments

Comments
 (0)