Skip to content

Commit d855b1f

Browse files
Notes on DNS programming
Signed-off-by: Aldo Culquicondor <[email protected]>
1 parent e15a4bc commit d855b1f

File tree

1 file changed

+23
-15
lines changed

1 file changed

+23
-15
lines changed

keps/sig-apps/2214-indexed-job/README.md

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,8 @@ This KEP extends kubernetes with user-friendly support for running parallel jobs
6363

6464
Here, parallel means multiple pods per Job. Jobs can be:
6565
- Embarrassingly parallel, where the pods have no dependencies between each other.
66-
- Tightly coupled, where the Pods communicate among themselves to make progress.
66+
- Tightly coupled, where the Pods communicate among themselves to make progress
67+
(kubernetes/kubernetes#99497)[https://github.com/kubernetes/kubernetes/issues/99497]
6768

6869
We propose the addition of completion indexes into the Pods of a *Job
6970
[with fixed completion count]* to support running embarrassingly parallel
@@ -223,7 +224,7 @@ However, the APIs have major differences:
223224
224225
- More than one pod created per index.
225226
226-
Jobs have a known issue in which more than one Pod can be started even if
227+
Jobs have a known rare issue in which more than one Pod can be started even if
227228
parallelism and completion are set to 1 ([reference]). In the case of indexed
228229
Jobs, this translates to more than one Pod having the same index.
229230
@@ -232,10 +233,21 @@ However, the APIs have major differences:
232233
233234
- Scalability and latency of DNS programming.
234235
235-
DNS programming requires the update of EndpointSlices and writing DNS records.
236+
DNS programming requires the update of EndpointSlices by the endpoint
237+
controller and updating DNS records by the DNS provider.
236238
This might not scale well for short-lived Jobs with high number of parallelism.
237-
Moreoever, Pods need to be prepared to retry lookups in the case were the
238-
records didn't have time to update.
239+
Moreoever, Pods need to be prepared to:
240+
- Retry lookups in the case were the records didn't have time to update.
241+
- Handle more than one IP for the CNAME. This might happen temporarily when:
242+
- the job controller creates more than one pod per index or
243+
- the job controller creates a replacement of a failed Pod before the DNS
244+
provider clears the record for the failed pod. This will be uncommon
245+
as the endpoint controller should see the failed Pod before it sees the
246+
replacement Pod.
247+
<UNRESOLVED>
248+
The recommendation for applications is to request a new DNS resolution until
249+
the DNS server returns one IP.
250+
</UNRESOLVED>
239251
240252
However, network programming is opt-in (users need to create a matching
241253
headless Service). Moreover, workloads have other means of obtaining IPs,
@@ -682,11 +694,9 @@ _This section must be completed when targeting beta graduation to a release._
682694

683695
Completion indexes could also be part of the Pod name, leading to stable Pod
684696
names. This allows 2 things:
685-
- Uniqueness for each completion index, freeing applications from having to
686-
handle duplicated indexes.
687-
- Predictable hostnames, which benefits applications that need to communicate
688-
to Pods of a Job (or among Pods of the same Job) without having to do
689-
discovery.
697+
- Uniqueness for each completion index. This frees applications from having to
698+
handle duplicated indexes. When used along with a headless Service, there
699+
are less chances for a DNS record to refer to more than one Pod.
690700

691701
Stable pod names require the Job controller to remove failed Pods before
692702
creating a new one with the same index. This has some downsides:
@@ -696,11 +706,9 @@ _This section must be completed when targeting beta graduation to a release._
696706
the status of the Job, affecting retry backoffs and backoff limit. This
697707
needs to change before stable Pod names can be implemented
698708
[#28486](https://github.com/kubernetes/kubernetes/issues/28486).
699-
- Reduced availability of Job Pods per completion index. This happens when
700-
a Node becomes unavailable. The Job controller cannot remove such Pods.
701-
Either the kubelet in the Node recovers and marks the Pod as failed; or the
702-
kube-apiserver removes the Node and the garbage collector removes the orphan
703-
Pods.
709+
- Reduced availability of Job Pods per completion index as, in addition to
710+
the time necessary to create a new Pod, we need to account for the time of
711+
deleting the failed Pod.
704712

705713
However, stable Pod names can be offered later as a new value for
706714
`.spec.completionMode` for Jobs.

0 commit comments

Comments
 (0)