Skip to content

Commit b519bab

Browse files
authored
Merge pull request #2586 from wojtek-t/network_programmin_latency
Introduce definition of network programming latency SLI
2 parents 7b9c690 + c532c89 commit b519bab

File tree

2 files changed

+94
-0
lines changed

2 files changed

+94
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
## Network programming latency SLIs/SLOs details
2+
3+
### Definition
4+
5+
| Status | SLI | SLO |
6+
| --- | --- | --- |
7+
| __WIP__ | Latency of programming a single (e.g. iptables on a given node) in-cluster load balancing mechanism, measured from when service spec or list of its `Ready` pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile of (99th percentiles across all programmers (e.g. iptables)) per cluster-day <= X |
8+
9+
### User stories
10+
- As a user of vanilla Kubernetes, I want some guarantee how quickly new backends
11+
of my service will be targets of in-cluster load-balancing
12+
- As a user of vanilla Kubernetes, I want some guarantee how quickly deleted
13+
(or unhealthy) backends of my service will be removed from in-cluster
14+
load-balancing
15+
- As a user of vanilla Kubernetes, I want some guarantee how quickly changes
16+
to service specification (including creation) will be reflected in in-cluster
17+
load-balancing
18+
19+
### Other notes
20+
- We are consciously focusing on in-cluster load-balancing for the purpose of
21+
this SLI, as external load-balancing is clearly provider specific (which makes
22+
it hard to set the SLO for it).
23+
- However, in the future it should be possible to formulate the SLI for external
24+
load-balancing in pretty much the same way for consistency.
25+
- The SLI measuring end-to-end time from pod creation was also considered,
26+
but rejected due to being application specific, and thus introducing SLO would
27+
be impossible.
28+
29+
### Caveats
30+
- The SLI is formulated for a single "programmer" (e.g. iptables on a single
31+
node), even though that value itself is not very interesting for the user.
32+
In case there are multiple programmers in the cluster, the aggregation across
33+
them is done only at the SLO level (and only that gives a value that is somehow
34+
interesting for the user). The reason for doing it this is feasibility for
35+
efficiently computing that:
36+
- if we would be doing aggregation at the SLI level (i.e. the SLI would be
37+
formulated like "... reflected in in-cluster load-balancing mechanism and
38+
visible from 99% of programmers"), computing that SLI would be extremely
39+
difficult. It's because in order to decide e.g. whether pod transition to
40+
Ready state is reflected, we would have to know when exactly it was reflected
41+
in 99% of programmers (e.g. iptables). That requires tracking metrics on
42+
per-change base (which we can't do efficiently).
43+
- we admit that the SLO is a bit weaker in that form (i.e. it doesn't necessary
44+
force that a given change is reflected in 99% of programmers with a given
45+
99th percentile latency), but it's close enough approximation.
46+
47+
### How to measure the SLI.
48+
The method of measuring this SLI is not obvious, so for completeness we describe
49+
it here how it will be implemented with all caveats.
50+
1. We assume that for the in-cluster load-balancing programming we are using
51+
Kubernetes `Endpoints` objects.
52+
1. We will introduce a dedicated annotation for `Endpoints` object (name TBD).
53+
1. Endpoints controller (while updating a given `Endpoints` object) will be
54+
setting value of that annotation to the timestamp of the change that triggered
55+
this update:
56+
- for pod transition between `Ready` and `NotReady` states, its timestamp is
57+
simply part of pod condition
58+
- TBD for service updates (ideally we will add `LastUpdateTimestamp` field in
59+
object metadata next to already existing `CreationTimestamp`. The data is
60+
already present at storage layer, so it won't be hard to propagate that.
61+
1. The in-cluster load-balancing programmer will export a prometheus metric
62+
once done with programming. The latency of the operation is defined as
63+
difference betweem timestamp of then whe operation is done and timestamp
64+
recorded in the newly introduced annotation.
65+
66+
#### Caveats
67+
There are a couple of caveats to that measurement method:
68+
1. Single `Endpoints` object may batch multiple pod state transition. <br/>
69+
In that case, we simply choose the oldest one (and not expose all timestamps
70+
to avoid theoretically unbounded growth of the object). That makes the metric
71+
imprecise, but the batching period should be relatively small comparing
72+
to whole end-to-end flow.
73+
1. A single pod may transition its state multiple times within batching
74+
period. <br/>
75+
For that case, we will add additional cache in Endpoints controller caching
76+
the first observed transition timestamp for each pod. The cache will be
77+
cleared when controller picks up a pod into Endpoints object update. This is
78+
consistent with choosing the oldest update in the above point. <br/>
79+
Initially, we may consider simply ignoring this fact.
80+
1. Components may fall out of watch window history and thus miss some watch
81+
events. <br/>
82+
This may be the case for both Endpoints controller or kube-proxy (or other
83+
network programmers if used instead). That becomes a problem when a single
84+
object changed multiple times in the meantime (otherwise informers will
85+
deliver handlers on relisting). Additionally, this can happen only when
86+
components are too slow in processing events (that would already be reflected
87+
in metrics) or (sometimes) after kube-apiserver restart. Given that, we are
88+
going to neglect this problem to avoid unnecessary complications for little
89+
or no gain.
90+
91+
### Test scenario
92+
93+
__TODO: Describe test scenario.__

sig-scalability/slos/slos.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ Prerequisite: Kubernetes cluster is available and serving.
106106
| __Official__ | Latency of mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 1s | [Details](./api_call_latency.md) |
107107
| __Official__ | Latency of non-streaming read-only API calls for every (resource, scope pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) |
108108
| __Official__ | Startup latency of stateless and schedulable pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 5s | [Details](./pod_startup_latency.md) |
109+
| __WIP__ | Latency of programming a single (e.g. iptables on a given node) in-cluster load balancing mechanism, measured from when service spec or list of its `Ready` pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile of (99th percentiles across all programmers (e.g. iptables)) per cluster-day<sup>[1](#footnote1)</sup> <= X | [Details](./networking_programming_latency.md) |
109110

110111
<a name="footnote1">\[1\]</a> For the purpose of visualization it will be a
111112
sliding window. However, for the purpose of reporting the SLO, it means one

0 commit comments

Comments
 (0)