|
| 1 | +## Network programming latency SLIs/SLOs details |
| 2 | + |
| 3 | +### Definition |
| 4 | + |
| 5 | +| Status | SLI | SLO | |
| 6 | +| --- | --- | --- | |
| 7 | +| __WIP__ | Latency of programming a single (e.g. iptables on a given node) in-cluster load balancing mechanism, measured from when service spec or list of its `Ready` pods change to when it is reflected in load balancing mechanism, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile of (99th percentiles across all programmers (e.g. iptables)) per cluster-day <= X | |
| 8 | + |
| 9 | +### User stories |
| 10 | +- As a user of vanilla Kubernetes, I want some guarantee how quickly new backends |
| 11 | +of my service will be targets of in-cluster load-balancing |
| 12 | +- As a user of vanilla Kubernetes, I want some guarantee how quickly deleted |
| 13 | +(or unhealthy) backends of my service will be removed from in-cluster |
| 14 | +load-balancing |
| 15 | +- As a user of vanilla Kubernetes, I want some guarantee how quickly changes |
| 16 | +to service specification (including creation) will be reflected in in-cluster |
| 17 | +load-balancing |
| 18 | + |
| 19 | +### Other notes |
| 20 | +- We are consciously focusing on in-cluster load-balancing for the purpose of |
| 21 | +this SLI, as external load-balancing is clearly provider specific (which makes |
| 22 | +it hard to set the SLO for it). |
| 23 | +- However, in the future it should be possible to formulate the SLI for external |
| 24 | +load-balancing in pretty much the same way for consistency. |
| 25 | +- The SLI measuring end-to-end time from pod creation was also considered, |
| 26 | +but rejected due to being application specific, and thus introducing SLO would |
| 27 | +be impossible. |
| 28 | + |
| 29 | +### Caveats |
| 30 | +- The SLI is formulated for a single "programmer" (e.g. iptables on a single |
| 31 | +node), even though that value itself is not very interesting for the user. |
| 32 | +In case there are multiple programmers in the cluster, the aggregation across |
| 33 | +them is done only at the SLO level (and only that gives a value that is somehow |
| 34 | +interesting for the user). The reason for doing it this is feasibility for |
| 35 | +efficiently computing that: |
| 36 | + - if we would be doing aggregation at the SLI level (i.e. the SLI would be |
| 37 | + formulated like "... reflected in in-cluster load-balancing mechanism and |
| 38 | + visible from 99% of programmers"), computing that SLI would be extremely |
| 39 | + difficult. It's because in order to decide e.g. whether pod transition to |
| 40 | + Ready state is reflected, we would have to know when exactly it was reflected |
| 41 | + in 99% of programmers (e.g. iptables). That requires tracking metrics on |
| 42 | + per-change base (which we can't do efficiently). |
| 43 | + - we admit that the SLO is a bit weaker in that form (i.e. it doesn't necessary |
| 44 | + force that a given change is reflected in 99% of programmers with a given |
| 45 | + 99th percentile latency), but it's close enough approximation. |
| 46 | + |
| 47 | +### How to measure the SLI. |
| 48 | +The method of measuring this SLI is not obvious, so for completeness we describe |
| 49 | +it here how it will be implemented with all caveats. |
| 50 | +1. We assume that for the in-cluster load-balancing programming we are using |
| 51 | +Kubernetes `Endpoints` objects. |
| 52 | +1. We will introduce a dedicated annotation for `Endpoints` object (name TBD). |
| 53 | +1. Endpoints controller (while updating a given `Endpoints` object) will be |
| 54 | +setting value of that annotation to the timestamp of the change that triggered |
| 55 | +this update: |
| 56 | +- for pod transition between `Ready` and `NotReady` states, its timestamp is |
| 57 | + simply part of pod condition |
| 58 | +- TBD for service updates (ideally we will add `LastUpdateTimestamp` field in |
| 59 | + object metadata next to already existing `CreationTimestamp`. The data is |
| 60 | + already present at storage layer, so it won't be hard to propagate that. |
| 61 | +1. The in-cluster load-balancing programmer will export a prometheus metric |
| 62 | +once done with programming. The latency of the operation is defined as |
| 63 | +difference betweem timestamp of then whe operation is done and timestamp |
| 64 | +recorded in the newly introduced annotation. |
| 65 | + |
| 66 | +#### Caveats |
| 67 | +There are a couple of caveats to that measurement method: |
| 68 | +1. Single `Endpoints` object may batch multiple pod state transition. <br/> |
| 69 | +In that case, we simply choose the oldest one (and not expose all timestamps |
| 70 | +to avoid theoretically unbounded growth of the object). That makes the metric |
| 71 | +imprecise, but the batching period should be relatively small comparing |
| 72 | +to whole end-to-end flow. |
| 73 | +1. A single pod may transition its state multiple times within batching |
| 74 | +period. <br/> |
| 75 | +For that case, we will add additional cache in Endpoints controller caching |
| 76 | +the first observed transition timestamp for each pod. The cache will be |
| 77 | +cleared when controller picks up a pod into Endpoints object update. This is |
| 78 | +consistent with choosing the oldest update in the above point. <br/> |
| 79 | +Initially, we may consider simply ignoring this fact. |
| 80 | +1. Components may fall out of watch window history and thus miss some watch |
| 81 | +events. <br/> |
| 82 | +This may be the case for both Endpoints controller or kube-proxy (or other |
| 83 | +network programmers if used instead). That becomes a problem when a single |
| 84 | +object changed multiple times in the meantime (otherwise informers will |
| 85 | +deliver handlers on relisting). Additionally, this can happen only when |
| 86 | +components are too slow in processing events (that would already be reflected |
| 87 | +in metrics) or (sometimes) after kube-apiserver restart. Given that, we are |
| 88 | +going to neglect this problem to avoid unnecessary complications for little |
| 89 | +or no gain. |
| 90 | + |
| 91 | +### Test scenario |
| 92 | + |
| 93 | +__TODO: Describe test scenario.__ |
0 commit comments