Skip to content

Commit 3e186d1

Browse files
authored
Merge pull request #2591 from wojtek-t/explicit_definition_in_per_item_doc
Make SLOs page more clear
2 parents d6c5f0f + 5a275c4 commit 3e186d1

File tree

6 files changed

+60
-27
lines changed

6 files changed

+60
-27
lines changed

sig-scalability/slos/api_call_latency.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
## API call latency SLIs/SLOs details
22

3+
### Definition
4+
5+
| Status | SLI | SLO |
6+
| --- | --- | --- |
7+
| __Official__ | Latency<sup>[1](#footnote1)</sup> of mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day <= 1s |
8+
| __Official__ | Latency<sup>[1](#footnote1)</sup> of non-streaming read-only<sup>[3](#footnote3)</sup> API calls for every (resource, scope<sup>[4](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` |
9+
10+
<a name="footnote1">\[1\]</a>By latency of API call in this doc we mean time
11+
from the moment when apiserver gets the request to last byte of response sent
12+
to the user.
13+
14+
<a name="footnote2">\[2\]</a>By mutating API calls we mean POST, PUT, DELETE
15+
and PATCH.
16+
17+
<a name="footnote3">\[3\]</a>By non-streaming read-only API calls we mean GET
18+
requests without `watch=true` option set. (Note that in Kubernetes internally
19+
it translates to both GET and LIST calls).
20+
21+
<a name="footnote4">\[4\]</a>A scope of a request can be either (a) `resource`
22+
if the request is about a single object, (b) `namespace` if it is about objects
23+
from a single namespace or (c) `cluster` if it spawns objects from multiple
24+
namespaces.
25+
326
### User stories
427
- As a user of vanilla Kubernetes, I want some guarantee how quickly I get the
528
response from an API call.

sig-scalability/slos/api_extensions_latency.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
## API call extension points latency SLIs details
22

3+
### Definition
4+
5+
| Status | SLI |
6+
| --- | --- |
7+
| WIP | Admission latency for each admission plugin type, measured as 99th percentile over last 5 minutes |
8+
| WIP | Webhook call latency for each webhook type, measured as 99th percentile over last 5 minutes
9+
| WIP | Initializer latency for each initializer, measured as 99th percentile over last 5 minutes |
10+
311
### User stories
412
- As an administrator, if API calls are slow, I would like to know if this is
513
because slow extension points (admission plugins, webhooks, initializers) and

sig-scalability/slos/pod_startup_latency.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
## Pod startup latency SLI/SLO details
22

3+
### Definition
4+
5+
| Status | SLI | SLO |
6+
| --- | --- | --- |
7+
| __Official__ | Startup latency of stateless<sup>[1](#footnote1)</sup> and schedulable<sup>[2](#footnote2)</sup> pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day <= 5s |
8+
9+
<a name="footnote1">[1\]</a>A `stateless pod` is defined as a pod that doesn't
10+
mount volumes with sources other than secrets, config maps, downward API and
11+
empty dir.
12+
13+
<a name="footnote2">[2\]</a>By schedulable pod we mean a pod that can be
14+
scheduled in the cluster without causing any preemption.
15+
316
### User stories
417
- As a user of vanilla Kubernetes, I want some guarantee how quickly my pods
518
will be started.

sig-scalability/slos/slos.md

Lines changed: 4 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -100,37 +100,14 @@ Prerequisite: Kubernetes cluster is available and serving.
100100

101101
| Status | SLI | SLO | User stories, test scenarios, ... |
102102
| --- | --- | --- | --- |
103-
| __Official__ | Latency<sup>[1](#footnote1)</sup> of mutating<sup>[2](#footnote2)</sup> API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[3](#footnote3)</sup> <= 1s | [Details](./api_call_latency.md) |
104-
| __Official__ | Latency<sup>[1](#footnote1)</sup> of non-streaming read-only<sup>[4](#footnote3)</sup> API calls for every (resource, scope<sup>[5](#footnote4)</sup>) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) |
105-
| __Official__ | Startup latency of stateless<sup>[6](#footnode6)</sup> and schedulable<sup>[7](#footnote7)</sup> pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day <= 5s | [Details](./pod_startup_latency.md) |
103+
| __Official__ | Latency of mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 1s | [Details](./api_call_latency.md) |
104+
| __Official__ | Latency of non-streaming read-only API calls for every (resource, scope pair, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> (a) <= 1s if `scope=resource` (b) <= 5s if `scope=namespace` (c) <= 30s if `scope=cluster` | [Details](./api_call_latency.md) |
105+
| __Official__ | Startup latency of stateless and schedulable pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutes | In default Kubernetes installation, 99th percentile per cluster-day<sup>[1](#footnote1)</sup> <= 5s | [Details](./pod_startup_latency.md) |
106106

107-
<a name="footnote1">\[1\]</a>By latency of API call in this doc we mean time
108-
from the moment when apiserver gets the request to last byte of response sent
109-
to the user.
110-
111-
<a name="footnote2">\[2\]</a>By mutating API calls we mean POST, PUT, DELETE
112-
and PATCH.
113-
114-
<a name="footnote3">\[3\]</a> For the purpose of visualization it will be a
107+
<a name="footnote1">\[1\]</a> For the purpose of visualization it will be a
115108
sliding window. However, for the purpose of reporting the SLO, it means one
116109
point per day (whether SLO was satisfied on a given day or not).
117110

118-
<a name="footnote4">\[4\]</a>By non-streaming read-only API calls we mean GET
119-
requests without `watch=true` option set. (Note that in Kubernetes internally
120-
it translates to both GET and LIST calls).
121-
122-
<a name="footnote5">\[5\]</a>A scope of a request can be either (a) `resource`
123-
if the request is about a single object, (b) `namespace` if it is about objects
124-
from a single namespace or (c) `cluster` if it spawns objects from multiple
125-
namespaces.
126-
127-
<a name="footnode6">[6\]</a>A `stateless pod` is defined as a pod that doesn't
128-
mount volumes with sources other than secrets, config maps, downward API and
129-
empty dir.
130-
131-
<a name="footnode7">[7\]</a>By schedulable pod we mean a pod that can be
132-
scheduled in the cluster without causing any preemption.
133-
134111
### Burst SLIs/SLOs
135112

136113
| Status | SLI | SLO | User stories, test scenarios, ... |

sig-scalability/slos/system_throughput.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
## System throughput SLI/SLO details
22

3+
### Definition
4+
5+
| Status | SLI | SLO |
6+
| --- | --- | --- |
7+
| WIP | Time to start 30\*#nodes pods, measured from test scenario start until observing last Pod as ready | Benchmark: when all images present on all Nodes, 99th percentile <= X minutes |
8+
39
### User stories
410
- As a user, I want a guarantee that my workload of X pods can be started
511
within a given time

sig-scalability/slos/watch_latency.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
11
## Watch latency SLI details
22

3+
### Definition
4+
5+
| Status | SLI |
6+
| --- | --- |
7+
| WIP | Watch latency for every resource, (from the moment when object is stored in database to when it's ready to be sent to all watchers), measured as 99th percentile over last 5 minutes |
8+
39
### User stories
410
- As an administrator, if Kubernetes is slow, I would like to know if the root
511
cause of it is slow api-machinery (slow watch) or something farther the path

0 commit comments

Comments
 (0)