Skip to content

Commit e15c1f1

Browse files
serathiusTim Bannister
authored andcommitted
Propose KEP-5116: Streaming response encoding
Co-authored-by: Tim Bannister <[email protected]>
1 parent 3c05901 commit e15c1f1

File tree

3 files changed

+382
-0
lines changed

3 files changed

+382
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 5116
2+
beta:
3+
approver: "@jpbetz"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,357 @@
1+
# KEP-5116: Streaming Encoding for LIST Responses
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Streaming collections and gzip encoding](#streaming-collections-and-gzip-encoding)
13+
- [Test Plan](#test-plan)
14+
- [Prerequisite testing updates](#prerequisite-testing-updates)
15+
- [Unit tests](#unit-tests)
16+
- [Integration tests](#integration-tests)
17+
- [e2e tests](#e2e-tests)
18+
- [Graduation Criteria](#graduation-criteria)
19+
- [Beta](#beta)
20+
- [GA](#ga)
21+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
22+
- [Version Skew Strategy](#version-skew-strategy)
23+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
24+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
25+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
26+
- [Monitoring Requirements](#monitoring-requirements)
27+
- [Dependencies](#dependencies)
28+
- [Scalability](#scalability)
29+
- [Troubleshooting](#troubleshooting)
30+
- [Implementation History](#implementation-history)
31+
- [Drawbacks](#drawbacks)
32+
- [Alternatives](#alternatives)
33+
<!-- /toc -->
34+
35+
## Release Signoff Checklist
36+
37+
Items marked with (R) are required *prior to targeting to a milestone / release*.
38+
39+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
40+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
41+
- [x] (R) Design details are appropriately documented
42+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
43+
- [x] e2e Tests for all Beta API Operations (endpoints)
44+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
45+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
46+
- [x] (R) Graduation criteria is in place
47+
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
48+
- [x] (R) Production readiness review completed
49+
- [x] (R) Production readiness review approved
50+
- [ ] "Implementation History" section is up-to-date for milestone
51+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
52+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
53+
54+
55+
## Summary
56+
57+
This KEP proposes implementing streaming encoding for collections (responses to **list**) that are served by the Kubernetes API server.
58+
Existing encoders marshall response into one block allocating GBs of data and keeping it until client reads the whole response.
59+
For large LIST responses, this leads to excessive memory consumption in the API server.
60+
Streaming the encoding process can significantly reduce memory usage, improving scalability and cost-efficiency.
61+
62+
## Motivation
63+
64+
The Kubernetes API server's memory usage presents a significant challenge, particularly when dealing with large resources and LIST requests.
65+
Users can easily issue **list** requests that retrieve gigabytes of data, especially with CustomResourceDefinitions (CRDs). Custom resources often suffer from significant data bloat when encoded in JSON.
66+
67+
Current API server response encoders were designed with smaller responses in mind,
68+
assuming they could allocate the entire response in a single contiguous memory block.
69+
This assumption breaks down with the scale of data returned by large **list** requests.
70+
Even well-intentioned users can create naive controllers that issue multiple concurrent **list** requests without properly handling the responses.
71+
This can lead to the API server holding entire responses in memory for extended periods, sometimes minutes, while waiting for the controller to process them.
72+
73+
The resulting unpredictable memory usage forces administrators to significantly over-provision API server memory to accommodate potential spikes.
74+
75+
### Goals
76+
77+
* Implement JSON and Protocol Buffer streaming encoders for collections (responses to a **list** are called _collections_).
78+
* Significantly reduce and make more predictable the API server's memory consumption when serving large LIST responses.
79+
80+
### Non-Goals
81+
82+
* Implementing streaming decoders in clients. This KEP focuses on protecting the API server's memory usage. Clients can utilize existing mechanisms like pagination or WatchList to manage large datasets.
83+
* Implementing streaming encoders for all content types (e.g. `YAML`, `as=Table`). This KEP focuses on the most commonly used and resource-intensive content types to address the most impactful cases first.
84+
* Implementing streaming for CBOR encoding at this time. CBOR support will be considered as part of a broader effort related to CBOR serialization in Kubernetes and tracked separately.
85+
86+
## Proposal
87+
88+
This proposal focuses on implementing streaming encoding for JSON and Protocol Buffer (Proto) for responses to **list** requests.
89+
The core idea is to avoid loading the entire response into memory before encoding.
90+
Instead, the encoder will process objects individually, streaming the encoded data to the client.
91+
Assuming we will deliver all nessesery testing we plan to launch the feature directly to Beta.
92+
93+
Encoding items one by one significantly reduces the memory footprint required by the API server.
94+
Given the Kubernetes limit of 1MB per object, the memory overhead per request becomes manageable.
95+
While this approach may increase overall CPU usage and memory allocations,
96+
the trade-off is considered worthwhile due to the substantial reduction in peak memory usage,
97+
leading to improved API server stability and scalability.
98+
99+
Existing JSON and Proto encoding libraries do not natively support streaming.
100+
Therefore, custom streaming encoders will be implemented.
101+
Because we focus on encoding collections (**list** responses), the implementation scope is narrowed,
102+
requiring encoders for a limited set of Kubernetes API types.
103+
We anticipate approximately 100 lines of code per encoder per type.
104+
Extensive testing, drawing upon test cases developed for the CBOR serialization effort,
105+
will ensure compatibility with existing encoding behavior.
106+
107+
Long term, the goal is for upstream JSON and Proto libraries to natively support streaming encoding.
108+
For JSON, initial exploration and validation using the experimental `json/v2` package has shown
109+
promising results and confirmed its suitability for our requirements.
110+
Further details can be found in [kubernetes/kubernetes#129304](https://github.com/kubernetes/kubernetes/issues/129304#issuecomment-2612704644).
111+
112+
### Risks and Mitigations
113+
114+
115+
## Design Details
116+
117+
Implementing streaming encoders specifically for collections significantly reduces the scope,
118+
allowing us to focus on a limited set of types and avoid the complexities of a fully generic streaming encoder.
119+
The core difference in our approach will be special handling of the `Items` field within collections structs,
120+
Instead of encoding the entire `Items` array at once, we will iterate through the array and encode each item individually, streaming the encoded data to the client.
121+
122+
This targeted approach enables the following implementation criteria:
123+
124+
* **Strict Validation:** Before proceeding with streaming encoding,
125+
the implementation will rigorously validate the Go struct tags of the target type.
126+
If the tags do not precisely match the expected structure, we will fallback to standard encoder.
127+
This precautionary measure prevents incompatibility upon change of structure fields or encoded representation.
128+
* **Delegation to Standard Encoder:** The encoding of all fields *other than* `Items`,
129+
as well as the encoding of each individual item *within* the `Items` array,
130+
will be delegated to the standard `encoding/json` (for JSON) or `protobuf` (for Proto) packages.
131+
This leverages the existing, well-tested encoding logic and minimizes the amount of custom code required, reducing the risk of introducing bugs.
132+
133+
The types requiring custom streaming encoders are:
134+
135+
* `*List` types for built-in Kubernetes APIs (e.g., `PodList`, `ConfigMapList`).
136+
* `UnstructuredList` for collections of custom resources.
137+
* `runtime.Unknown` used for Proto encoder to provide type information.
138+
139+
To further enhance robustness, a static analysis check will be introduced to detect and prevent any inconsistencies in Go struct tags across different built-in collection types.
140+
This addresses the concern that not all `*List` types may have perfectly consistent tag definitions.
141+
142+
### Streaming collections and gzip encoding
143+
144+
As pointed out in [kubernetes/kubernetes#129334#discussion_r1938405782](https://github.com/kubernetes/kubernetes/pull/129334#discussion_r1938405782),
145+
the current Kubernetes gzip encoding implementation assumes the response is written in a single large chunk,
146+
checking just first write size to determine if the response is large enough for compression.
147+
This is a bad assumption about internal encoder implementation details and should be fixed regardless.
148+
149+
To ensure response compression works well with streaming,
150+
we will preempt all encoder changes by fixing the gzip compression.
151+
First will add unit tests that will prevent subsequent changes from impacting results,
152+
especially around the compression threshold.
153+
Then, we will rewrite the gzip compression to buffer the response and delay the
154+
decision to enable compression until we have observed enough bytes to hit the threshold
155+
or we received whole response and we can write it without compressing.
156+
157+
### Test Plan
158+
159+
[x] I/we understand the owners of the involved components may require updates to
160+
existing tests to make this code solid enough prior to committing the changes necessary
161+
to implement this enhancement.
162+
163+
##### Prerequisite testing updates
164+
165+
##### Unit tests
166+
167+
We will implement testing following the cases borrowed from the CBOR test plan ([https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer#test-plan](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/4222-cbor-serializer#test-plan)), skipping tests that do not apply to streaming *encoding*, such as those related to decoding.
168+
169+
Specifically, we will ensure byte-for-byte compatibility with the standard `encoding/json` and `protobuf` encoders for the following cases:
170+
171+
* Preserving the distinction between integers and floating-point numbers.
172+
* Handling structs with duplicate field names (JSON tag names) without producing duplicate keys in the encoded output ([https://go.dev/issue/17913](https://go.dev/issue/17913)).
173+
* Encoding Go strings containing invalid UTF-8 sequences without error.
174+
* Preserving the distinction between absent, present-but-null, and present-and-empty states for slices and maps.
175+
* Handling structs implementing `MarshallJSON` method, especially built-in collection types.
176+
* Handling raw bytes.
177+
* Linting unit test to ensure all our built-in collection types would be matched.
178+
179+
Fuzz tests will cover the custom streaming encoders for the types with overwritten encoders:
180+
* `testingapigroup.CarpList` as surrogate for built-in types
181+
* `UnstructuredList`
182+
183+
The skipped tests are primarily related to decoding or CBOR-specific features, which are not relevant to the streaming encoding of JSON and Proto addressed by this KEP.
184+
185+
##### Integration tests
186+
187+
With one to one compatibility to the existing encoder we don't expect integration tests between components will be needed.
188+
189+
##### e2e tests
190+
191+
Scalability tests that will confirm the improvements and protect against future regressions.
192+
Improvements in the resources should be noticiable in on the perf-dash.
193+
194+
The tests will cover the following properties:
195+
* Large resource, 10000 objects each 100KB size.
196+
* List with `RV=0` to ensure response is served from watch cache and all the overhead comes from encoder memory allocation.
197+
* Different content type JSON (default), Proto, CBOR, YAML.
198+
* Different API kinds, eg ConfigMap, Pod, custom resources
199+
200+
In first iteration we expect we will overallocate the resources needed for apiserver to ensure passage,
201+
however after the improvement is implemented we will tune down the resources to detect regressions.
202+
203+
### Graduation Criteria
204+
205+
#### Beta
206+
207+
- Gzip compression is supporting chunking
208+
- All encoder unit tests are implemented
209+
- Streaming encoder for JSON and Proto are implemented
210+
- Scalability test are running and show improvement
211+
212+
#### GA
213+
214+
- Scalability tests are release blocking
215+
216+
217+
### Upgrade / Downgrade Strategy
218+
219+
We plan to provide byte to byte compatibility.
220+
221+
### Version Skew Strategy
222+
223+
We plan to provide byte to byte compatibility.
224+
225+
## Production Readiness Review Questionnaire
226+
227+
### Feature Enablement and Rollback
228+
229+
Via feature gates
230+
231+
###### How can this feature be enabled / disabled in a live cluster?
232+
233+
234+
- [X] Feature gate (also fill in values in `kep.yaml`)
235+
- Feature gate name: StreamingCollectionEncodingToJSON, StreamingCollectionEncodingToProto
236+
- Components depending on the feature gate: kube-apiserver
237+
238+
###### Does enabling the feature change any default behavior?
239+
240+
No, we provide byte to byte compatibility.
241+
242+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
243+
244+
Yes, without problem.
245+
246+
###### What happens if we reenable the feature if it was previously rolled back?
247+
248+
###### Are there any tests for feature enablement/disablement?
249+
250+
Yes, will be covered by unit tests.
251+
252+
### Rollout, Upgrade and Rollback Planning
253+
254+
N/A
255+
256+
###### How can a rollout or rollback fail? Can it impact already running workloads?
257+
258+
N/A
259+
260+
###### What specific metrics should inform a rollback?
261+
262+
N/A
263+
264+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
265+
266+
N/A
267+
268+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
269+
270+
No
271+
272+
### Monitoring Requirements
273+
274+
275+
###### How can an operator determine if the feature is in use by workloads?
276+
277+
N/A
278+
279+
###### How can someone using this feature know that it is working for their instance?
280+
281+
N/A
282+
283+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
284+
285+
N/A
286+
287+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
288+
289+
N/A
290+
291+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
292+
293+
N/A
294+
295+
### Dependencies
296+
297+
No
298+
299+
###### Does this feature depend on any specific services running in the cluster?
300+
301+
No
302+
303+
### Scalability
304+
305+
###### Will enabling / using this feature result in any new API calls?
306+
307+
No
308+
309+
###### Will enabling / using this feature result in introducing new API types?
310+
311+
No
312+
313+
###### Will enabling / using this feature result in any new calls to the cloud provider?
314+
315+
No
316+
317+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
318+
319+
No
320+
321+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
322+
323+
No, we expect reduction.
324+
325+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
326+
327+
No, we expect reduction.
328+
329+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
330+
331+
No
332+
333+
### Troubleshooting
334+
335+
###### How does this feature react if the API server and/or etcd is unavailable?
336+
337+
N/A
338+
339+
###### What are other known failure modes?
340+
341+
N/A
342+
343+
###### What steps should be taken if SLOs are not being met to determine the problem?
344+
345+
## Implementation History
346+
347+
## Drawbacks
348+
349+
Maintaining around 500 lines of custom encoder code.
350+
351+
## Alternatives
352+
353+
Similar benefits can be achieved using `WatchList` effort, however we cannot depend on all users migrating to `WatchList`.
354+
355+
Wait for `json/v2` promotion from experimental, this reduces the maintenance, however it comes with even more risk.
356+
New package comes with breaking changes, testing showed that even when enabled in `v1` compatibility there might be some problems.
357+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
title: Streaming JSON Encoding for LIST Responses
2+
kep-number: 5116
3+
authors:
4+
- serathius
5+
owning-sig: sig-api-machinery
6+
status: implementable
7+
creation-date: 2025-01-31
8+
reviewers:
9+
- liggitt
10+
approvers:
11+
- deads2k
12+
stage: beta
13+
latest-milestone: "v1.33"
14+
milestone:
15+
beta: "v1.33"
16+
feature-gates:
17+
- name: StreamingCollectionEncodingToJSON
18+
components:
19+
- kube-apiserver
20+
- name: StreamingCollectionEncodingToProto
21+
components:
22+
- kube-apiserver

0 commit comments

Comments
 (0)