Skip to content

Commit 4b4eb91

Browse files
committed
Adding generated API docs + basic API docs and diagram
1 parent af1b216 commit 4b4eb91

File tree

7 files changed

+303
-5
lines changed

7 files changed

+303
-5
lines changed

Diff for: Makefile

+8
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,14 @@ live-docs:
162162
docker build -t gaie/mkdocs hack/mkdocs/image
163163
docker run --rm -it -p 3000:3000 -v ${PWD}:/docs gaie/mkdocs
164164

165+
.PHONY: api-ref-docs
166+
api-ref-docs:
167+
crd-ref-docs \
168+
--source-path=${PWD}/api \
169+
--config=crd-ref-docs.yaml \
170+
--renderer=markdown \
171+
--output-path=${PWD}/site-src/reference/spec.md
172+
165173
##@ Deployment
166174

167175
ifndef ignore-not-found

Diff for: crd-ref-docs.yaml

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
processor:
2+
ignoreTypes:
3+
- "(InferencePool|InferenceModel)List$"
4+
# RE2 regular expressions describing type fields that should be excluded from the generated documentation.
5+
ignoreFields:
6+
- "TypeMeta$"
7+
8+
render:
9+
# Version of Kubernetes to use when generating links to Kubernetes API documentation.
10+
kubernetesVersion: 1.31

Diff for: mkdocs.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ plugins:
2929
- mermaid2
3030
markdown_extensions:
3131
- admonition
32+
- markdown.extensions.nl2br
3233
- meta
3334
- pymdownx.emoji:
3435
emoji_index: !!python/name:material.extensions.emoji.twemoji
@@ -59,10 +60,10 @@ nav:
5960
- Getting started: guides/index.md
6061
- Implementer's Guide: guides/implementers.md
6162
- Reference:
63+
- API Reference: reference/spec.md
6264
- API Types:
6365
- InferencePool: api-types/inferencepool.md
6466
- InferenceModel: api-types/inferencemodel.md
65-
- API specification: reference/spec.md
6667
- Enhancements:
6768
- Overview: gieps/overview.md
6869
- Contributing:

Diff for: site-src/api-types/inferencepool.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,11 @@
77

88
## Background
99

10-
TODO
10+
InferencePool is
11+
12+
<!-- Source: https://docs.google.com/presentation/d/11HEYCgFi-aya7FS91JvAfllHiIlvfgcp7qpi_Azjk4E/edit#slide=id.g292839eca6d_1_0 -->
13+
<img src="/images/inferencepool-vs-service.png" alt="Comparing InferencePool with Service" class="center" width="550" />
14+
1115

1216
## Spec
1317

Diff for: site-src/images/inferencepool-vs-service.png

119 KB
Loading

Diff for: site-src/index.md

+16
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,24 @@ they are expected to manage:
1515

1616
### InferencePool
1717

18+
InferencePool represents a set of Inference-focused Pods and an extension that
19+
will be used to route to them. Within the broader Gateway API resource model,
20+
this resource is considered a "backend". In practice, that means that you'd
21+
replace a Kubernetes Service with an InferencePool. This resource has some
22+
similarities to Service (a way to select Pods and specify a port), but has some
23+
unique capabilities. With InferenceModel, you can configure a routing extension
24+
as well as inference-specific routing optimizations. For more information on
25+
this resource, refer to our [InferencePool documentation](/api-types/inferencepool).
26+
1827
### InferenceModel
1928

29+
An InferenceModel represents a model or adapter, and configuration associated
30+
with that model. This resource enables you to configure the relative criticality
31+
of a model, and allows you to seamlessly translate the requested model name to
32+
one or more backend model names. Multiple InferenceModels can be attached to an
33+
InferencePool. For more information on this resource, refer to our
34+
[InferenceModel documentation](/api-types/inferencemodel).
35+
2036
## Composable Layers
2137

2238
This project aims to develop an ecosystem of implementations that are fully

Diff for: site-src/reference/spec.md

+262-3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,264 @@
1-
# API Specification
1+
# API Reference
2+
3+
## Packages
4+
- [inference.networking.x-k8s.io/v1alpha1](#inferencenetworkingx-k8siov1alpha1)
5+
6+
7+
## inference.networking.x-k8s.io/v1alpha1
8+
9+
Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group
10+
11+
### Resource Types
12+
- [InferenceModel](#inferencemodel)
13+
- [InferencePool](#inferencepool)
14+
15+
16+
17+
#### Criticality
18+
19+
_Underlying type:_ _string_
20+
21+
Defines how important it is to serve the model compared to other models.
22+
23+
_Validation:_
24+
- Enum: [Critical Default Sheddable]
25+
26+
_Appears in:_
27+
- [InferenceModelSpec](#inferencemodelspec)
28+
29+
| Field | Description |
30+
| --- | --- |
31+
| `Critical` | Most important. Requests to this band will be shed last.<br /> |
32+
| `Default` | More important than Sheddable, less important than Critical.<br />Requests in this band will be shed before critical traffic.<br />+kubebuilder:default=Default<br /> |
33+
| `Sheddable` | Least important. Requests to this band will be shed before all other bands.<br /> |
34+
35+
36+
#### InferenceModel
37+
38+
39+
40+
InferenceModel is the Schema for the InferenceModels API
41+
42+
43+
44+
45+
46+
| Field | Description | Default | Validation |
47+
| --- | --- | --- | --- |
48+
| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | |
49+
| `kind` _string_ | `InferenceModel` | | |
50+
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
51+
| `spec` _[InferenceModelSpec](#inferencemodelspec)_ | | | |
52+
| `status` _[InferenceModelStatus](#inferencemodelstatus)_ | | | |
53+
54+
55+
#### InferenceModelSpec
56+
57+
58+
59+
InferenceModelSpec represents a specific model use case. This resource is
60+
managed by the "Inference Workload Owner" persona.
61+
62+
63+
The Inference Workload Owner persona is: a team that trains, verifies, and
64+
leverages a large language model from a model frontend, drives the lifecycle
65+
and rollout of new versions of those models, and defines the specific
66+
performance and latency goals for the model. These workloads are
67+
expected to operate within an InferencePool sharing compute capacity with other
68+
InferenceModels, defined by the Inference Platform Admin.
69+
70+
71+
InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool,
72+
if the name is reused, an error will be shown on the status of a
73+
InferenceModel that attempted to reuse. The oldest InferenceModel, based on
74+
creation timestamp, will be selected to remain valid. In the event of a race
75+
condition, one will be selected at random.
76+
77+
78+
79+
_Appears in:_
80+
- [InferenceModel](#inferencemodel)
81+
82+
| Field | Description | Default | Validation |
83+
| --- | --- | --- | --- |
84+
| `modelName` _string_ | The name of the model as the users set in the "model" parameter in the requests.<br />The name should be unique among the workloads that reference the same backend pool.<br />This is the parameter that will be used to match the request with. In the future, we may<br />allow to match on other request parameters. The other approach to support matching on<br />on other request parameters is to use a different ModelName per HTTPFilter.<br />Names can be reserved without implementing an actual model in the pool.<br />This can be done by specifying a target model and setting the weight to zero,<br />an error will be returned specifying that no valid target model is found. | | MaxLength: 253 <br /> |
85+
| `criticality` _[Criticality](#criticality)_ | Defines how important it is to serve the model compared to other models referencing the same pool. | Default | Enum: [Critical Default Sheddable] <br /> |
86+
| `targetModels` _[TargetModel](#targetmodel) array_ | Allow multiple versions of a model for traffic splitting.<br />If not specified, the target model name is defaulted to the modelName parameter.<br />modelName is often in reference to a LoRA adapter. | | MaxItems: 10 <br /> |
87+
| `poolRef` _[PoolObjectReference](#poolobjectreference)_ | Reference to the inference pool, the pool must exist in the same namespace. | | Required: \{\} <br /> |
88+
89+
90+
#### InferenceModelStatus
91+
92+
93+
94+
InferenceModelStatus defines the observed state of InferenceModel
95+
96+
97+
98+
_Appears in:_
99+
- [InferenceModel](#inferencemodel)
100+
101+
| Field | Description | Default | Validation |
102+
| --- | --- | --- | --- |
103+
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. | | |
104+
105+
106+
#### InferencePool
107+
108+
109+
110+
InferencePool is the Schema for the Inferencepools API
111+
112+
113+
114+
115+
116+
| Field | Description | Default | Validation |
117+
| --- | --- | --- | --- |
118+
| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | |
119+
| `kind` _string_ | `InferencePool` | | |
120+
| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | |
121+
| `spec` _[InferencePoolSpec](#inferencepoolspec)_ | | | |
122+
| `status` _[InferencePoolStatus](#inferencepoolstatus)_ | | | |
123+
124+
125+
#### InferencePoolSpec
126+
127+
128+
129+
InferencePoolSpec defines the desired state of InferencePool
130+
131+
132+
133+
_Appears in:_
134+
- [InferencePool](#inferencepool)
135+
136+
| Field | Description | Default | Validation |
137+
| --- | --- | --- | --- |
138+
| `selector` _object (keys:[LabelKey](#labelkey), values:[LabelValue](#labelvalue))_ | Selector uses a map of label to watch model server pods<br />that should be included in the InferencePool. ModelServers should not<br />be with any other Service or InferencePool, that behavior is not supported<br />and will result in sub-optimal utilization.<br />In some cases, implementations may translate this to a Service selector, so this matches the simple<br />map used for Service selectors instead of the full Kubernetes LabelSelector type. | | Required: \{\} <br /> |
139+
| `targetPortNumber` _integer_ | TargetPortNumber is the port number that the model servers within the pool expect<br />to receive traffic from.<br />This maps to the TargetPort in: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort | | Maximum: 65535 <br />Minimum: 0 <br />Required: \{\} <br /> |
140+
141+
142+
#### InferencePoolStatus
143+
144+
145+
146+
InferencePoolStatus defines the observed state of InferencePool
147+
148+
149+
150+
_Appears in:_
151+
- [InferencePool](#inferencepool)
152+
153+
| Field | Description | Default | Validation |
154+
| --- | --- | --- | --- |
155+
| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. | | |
156+
157+
158+
#### LabelKey
159+
160+
_Underlying type:_ _string_
161+
162+
Originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731
163+
Duplicated as to not take an unexpected dependency on gw's API.
164+
165+
166+
LabelKey is the key of a label. This is used for validation
167+
of maps. This matches the Kubernetes "qualified name" validation that is used for labels.
168+
169+
170+
Valid values include:
171+
172+
173+
* example
174+
* example.com
175+
* example.com/path
176+
* example.com/path.html
177+
178+
179+
Invalid values include:
180+
181+
182+
* example~ - "~" is an invalid character
183+
* example.com. - can not start or end with "."
184+
185+
_Validation:_
186+
- MaxLength: 253
187+
- MinLength: 1
188+
- Pattern: `^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$`
189+
190+
_Appears in:_
191+
- [InferencePoolSpec](#inferencepoolspec)
192+
193+
194+
195+
#### LabelValue
196+
197+
_Underlying type:_ _string_
198+
199+
LabelValue is the value of a label. This is used for validation
200+
of maps. This matches the Kubernetes label validation rules:
201+
* must be 63 characters or less (can be empty),
202+
* unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]),
203+
* could contain dashes (-), underscores (_), dots (.), and alphanumerics between.
204+
205+
206+
Valid values include:
207+
208+
209+
* MyValue
210+
* my.name
211+
* 123-my-value
212+
213+
_Validation:_
214+
- MaxLength: 63
215+
- MinLength: 0
216+
- Pattern: `^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$`
217+
218+
_Appears in:_
219+
- [InferencePoolSpec](#inferencepoolspec)
220+
221+
222+
223+
#### PoolObjectReference
224+
225+
226+
227+
PoolObjectReference identifies an API object within the namespace of the
228+
referrer.
229+
230+
231+
232+
_Appears in:_
233+
- [InferenceModelSpec](#inferencemodelspec)
234+
235+
| Field | Description | Default | Validation |
236+
| --- | --- | --- | --- |
237+
| `group` _string_ | Group is the group of the referent. | inference.networking.x-k8s.io | MaxLength: 253 <br />Pattern: `^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` <br /> |
238+
| `kind` _string_ | Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 <br />MinLength: 1 <br />Pattern: `^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$` <br /> |
239+
| `name` _string_ | Name is the name of the referent. | | MaxLength: 253 <br />MinLength: 1 <br />Required: \{\} <br /> |
240+
241+
242+
#### TargetModel
243+
244+
245+
246+
TargetModel represents a deployed model or a LoRA adapter. The
247+
Name field is expected to match the name of the LoRA adapter
248+
(or base model) as it is registered within the model server. Inference
249+
Gateway assumes that the model exists on the model server and is the
250+
responsibility of the user to validate a correct match. Should a model fail
251+
to exist at request time, the error is processed by the Instance Gateway,
252+
and then emitted on the appropriate InferenceModel object.
253+
254+
255+
256+
_Appears in:_
257+
- [InferenceModelSpec](#inferencemodelspec)
258+
259+
| Field | Description | Default | Validation |
260+
| --- | --- | --- | --- |
261+
| `name` _string_ | The name of the adapter as expected by the ModelServer. | | MaxLength: 253 <br /> |
262+
| `weight` _integer_ | Weight is used to determine the proportion of traffic that should be<br />sent to this target model when multiple versions of the model are specified. | 1 | Maximum: 1e+06 <br />Minimum: 0 <br /> |
2263

3-
This page contains the API field specification for Gateway API.
4264

5-
REPLACE_WITH_GENERATED_CONTENT

0 commit comments

Comments
 (0)