|
1 |
| -# API Specification |
| 1 | +# API Reference |
| 2 | + |
| 3 | +## Packages |
| 4 | +- [inference.networking.x-k8s.io/v1alpha1](#inferencenetworkingx-k8siov1alpha1) |
| 5 | + |
| 6 | + |
| 7 | +## inference.networking.x-k8s.io/v1alpha1 |
| 8 | + |
| 9 | +Package v1alpha1 contains API Schema definitions for the gateway v1alpha1 API group |
| 10 | + |
| 11 | +### Resource Types |
| 12 | +- [InferenceModel](#inferencemodel) |
| 13 | +- [InferencePool](#inferencepool) |
| 14 | + |
| 15 | + |
| 16 | + |
| 17 | +#### Criticality |
| 18 | + |
| 19 | +_Underlying type:_ _string_ |
| 20 | + |
| 21 | +Defines how important it is to serve the model compared to other models. |
| 22 | + |
| 23 | +_Validation:_ |
| 24 | +- Enum: [Critical Default Sheddable] |
| 25 | + |
| 26 | +_Appears in:_ |
| 27 | +- [InferenceModelSpec](#inferencemodelspec) |
| 28 | + |
| 29 | +| Field | Description | |
| 30 | +| --- | --- | |
| 31 | +| `Critical` | Most important. Requests to this band will be shed last.<br /> | |
| 32 | +| `Default` | More important than Sheddable, less important than Critical.<br />Requests in this band will be shed before critical traffic.<br />+kubebuilder:default=Default<br /> | |
| 33 | +| `Sheddable` | Least important. Requests to this band will be shed before all other bands.<br /> | |
| 34 | + |
| 35 | + |
| 36 | +#### InferenceModel |
| 37 | + |
| 38 | + |
| 39 | + |
| 40 | +InferenceModel is the Schema for the InferenceModels API |
| 41 | + |
| 42 | + |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +| Field | Description | Default | Validation | |
| 47 | +| --- | --- | --- | --- | |
| 48 | +| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | | |
| 49 | +| `kind` _string_ | `InferenceModel` | | | |
| 50 | +| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | |
| 51 | +| `spec` _[InferenceModelSpec](#inferencemodelspec)_ | | | | |
| 52 | +| `status` _[InferenceModelStatus](#inferencemodelstatus)_ | | | | |
| 53 | + |
| 54 | + |
| 55 | +#### InferenceModelSpec |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +InferenceModelSpec represents a specific model use case. This resource is |
| 60 | +managed by the "Inference Workload Owner" persona. |
| 61 | + |
| 62 | + |
| 63 | +The Inference Workload Owner persona is: a team that trains, verifies, and |
| 64 | +leverages a large language model from a model frontend, drives the lifecycle |
| 65 | +and rollout of new versions of those models, and defines the specific |
| 66 | +performance and latency goals for the model. These workloads are |
| 67 | +expected to operate within an InferencePool sharing compute capacity with other |
| 68 | +InferenceModels, defined by the Inference Platform Admin. |
| 69 | + |
| 70 | + |
| 71 | +InferenceModel's modelName (not the ObjectMeta name) is unique for a given InferencePool, |
| 72 | +if the name is reused, an error will be shown on the status of a |
| 73 | +InferenceModel that attempted to reuse. The oldest InferenceModel, based on |
| 74 | +creation timestamp, will be selected to remain valid. In the event of a race |
| 75 | +condition, one will be selected at random. |
| 76 | + |
| 77 | + |
| 78 | + |
| 79 | +_Appears in:_ |
| 80 | +- [InferenceModel](#inferencemodel) |
| 81 | + |
| 82 | +| Field | Description | Default | Validation | |
| 83 | +| --- | --- | --- | --- | |
| 84 | +| `modelName` _string_ | The name of the model as the users set in the "model" parameter in the requests.<br />The name should be unique among the workloads that reference the same backend pool.<br />This is the parameter that will be used to match the request with. In the future, we may<br />allow to match on other request parameters. The other approach to support matching on<br />on other request parameters is to use a different ModelName per HTTPFilter.<br />Names can be reserved without implementing an actual model in the pool.<br />This can be done by specifying a target model and setting the weight to zero,<br />an error will be returned specifying that no valid target model is found. | | MaxLength: 253 <br /> | |
| 85 | +| `criticality` _[Criticality](#criticality)_ | Defines how important it is to serve the model compared to other models referencing the same pool. | Default | Enum: [Critical Default Sheddable] <br /> | |
| 86 | +| `targetModels` _[TargetModel](#targetmodel) array_ | Allow multiple versions of a model for traffic splitting.<br />If not specified, the target model name is defaulted to the modelName parameter.<br />modelName is often in reference to a LoRA adapter. | | MaxItems: 10 <br /> | |
| 87 | +| `poolRef` _[PoolObjectReference](#poolobjectreference)_ | Reference to the inference pool, the pool must exist in the same namespace. | | Required: \{\} <br /> | |
| 88 | + |
| 89 | + |
| 90 | +#### InferenceModelStatus |
| 91 | + |
| 92 | + |
| 93 | + |
| 94 | +InferenceModelStatus defines the observed state of InferenceModel |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +_Appears in:_ |
| 99 | +- [InferenceModel](#inferencemodel) |
| 100 | + |
| 101 | +| Field | Description | Default | Validation | |
| 102 | +| --- | --- | --- | --- | |
| 103 | +| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. | | | |
| 104 | + |
| 105 | + |
| 106 | +#### InferencePool |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | +InferencePool is the Schema for the Inferencepools API |
| 111 | + |
| 112 | + |
| 113 | + |
| 114 | + |
| 115 | + |
| 116 | +| Field | Description | Default | Validation | |
| 117 | +| --- | --- | --- | --- | |
| 118 | +| `apiVersion` _string_ | `inference.networking.x-k8s.io/v1alpha1` | | | |
| 119 | +| `kind` _string_ | `InferencePool` | | | |
| 120 | +| `metadata` _[ObjectMeta](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#objectmeta-v1-meta)_ | Refer to Kubernetes API documentation for fields of `metadata`. | | | |
| 121 | +| `spec` _[InferencePoolSpec](#inferencepoolspec)_ | | | | |
| 122 | +| `status` _[InferencePoolStatus](#inferencepoolstatus)_ | | | | |
| 123 | + |
| 124 | + |
| 125 | +#### InferencePoolSpec |
| 126 | + |
| 127 | + |
| 128 | + |
| 129 | +InferencePoolSpec defines the desired state of InferencePool |
| 130 | + |
| 131 | + |
| 132 | + |
| 133 | +_Appears in:_ |
| 134 | +- [InferencePool](#inferencepool) |
| 135 | + |
| 136 | +| Field | Description | Default | Validation | |
| 137 | +| --- | --- | --- | --- | |
| 138 | +| `selector` _object (keys:[LabelKey](#labelkey), values:[LabelValue](#labelvalue))_ | Selector uses a map of label to watch model server pods<br />that should be included in the InferencePool. ModelServers should not<br />be with any other Service or InferencePool, that behavior is not supported<br />and will result in sub-optimal utilization.<br />In some cases, implementations may translate this to a Service selector, so this matches the simple<br />map used for Service selectors instead of the full Kubernetes LabelSelector type. | | Required: \{\} <br /> | |
| 139 | +| `targetPortNumber` _integer_ | TargetPortNumber is the port number that the model servers within the pool expect<br />to receive traffic from.<br />This maps to the TargetPort in: https://pkg.go.dev/k8s.io/api/core/v1#ServicePort | | Maximum: 65535 <br />Minimum: 0 <br />Required: \{\} <br /> | |
| 140 | + |
| 141 | + |
| 142 | +#### InferencePoolStatus |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | +InferencePoolStatus defines the observed state of InferencePool |
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | +_Appears in:_ |
| 151 | +- [InferencePool](#inferencepool) |
| 152 | + |
| 153 | +| Field | Description | Default | Validation | |
| 154 | +| --- | --- | --- | --- | |
| 155 | +| `conditions` _[Condition](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.31/#condition-v1-meta) array_ | Conditions track the state of the InferencePool. | | | |
| 156 | + |
| 157 | + |
| 158 | +#### LabelKey |
| 159 | + |
| 160 | +_Underlying type:_ _string_ |
| 161 | + |
| 162 | +Originally copied from: https://github.com/kubernetes-sigs/gateway-api/blob/99a3934c6bc1ce0874f3a4c5f20cafd8977ffcb4/apis/v1/shared_types.go#L694-L731 |
| 163 | +Duplicated as to not take an unexpected dependency on gw's API. |
| 164 | + |
| 165 | + |
| 166 | +LabelKey is the key of a label. This is used for validation |
| 167 | +of maps. This matches the Kubernetes "qualified name" validation that is used for labels. |
| 168 | + |
| 169 | + |
| 170 | +Valid values include: |
| 171 | + |
| 172 | + |
| 173 | +* example |
| 174 | +* example.com |
| 175 | +* example.com/path |
| 176 | +* example.com/path.html |
| 177 | + |
| 178 | + |
| 179 | +Invalid values include: |
| 180 | + |
| 181 | + |
| 182 | +* example~ - "~" is an invalid character |
| 183 | +* example.com. - can not start or end with "." |
| 184 | + |
| 185 | +_Validation:_ |
| 186 | +- MaxLength: 253 |
| 187 | +- MinLength: 1 |
| 188 | +- Pattern: `^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?([A-Za-z0-9][-A-Za-z0-9_.]{0,61})?[A-Za-z0-9]$` |
| 189 | + |
| 190 | +_Appears in:_ |
| 191 | +- [InferencePoolSpec](#inferencepoolspec) |
| 192 | + |
| 193 | + |
| 194 | + |
| 195 | +#### LabelValue |
| 196 | + |
| 197 | +_Underlying type:_ _string_ |
| 198 | + |
| 199 | +LabelValue is the value of a label. This is used for validation |
| 200 | +of maps. This matches the Kubernetes label validation rules: |
| 201 | +* must be 63 characters or less (can be empty), |
| 202 | +* unless empty, must begin and end with an alphanumeric character ([a-z0-9A-Z]), |
| 203 | +* could contain dashes (-), underscores (_), dots (.), and alphanumerics between. |
| 204 | + |
| 205 | + |
| 206 | +Valid values include: |
| 207 | + |
| 208 | + |
| 209 | +* MyValue |
| 210 | +* my.name |
| 211 | +* 123-my-value |
| 212 | + |
| 213 | +_Validation:_ |
| 214 | +- MaxLength: 63 |
| 215 | +- MinLength: 0 |
| 216 | +- Pattern: `^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$` |
| 217 | + |
| 218 | +_Appears in:_ |
| 219 | +- [InferencePoolSpec](#inferencepoolspec) |
| 220 | + |
| 221 | + |
| 222 | + |
| 223 | +#### PoolObjectReference |
| 224 | + |
| 225 | + |
| 226 | + |
| 227 | +PoolObjectReference identifies an API object within the namespace of the |
| 228 | +referrer. |
| 229 | + |
| 230 | + |
| 231 | + |
| 232 | +_Appears in:_ |
| 233 | +- [InferenceModelSpec](#inferencemodelspec) |
| 234 | + |
| 235 | +| Field | Description | Default | Validation | |
| 236 | +| --- | --- | --- | --- | |
| 237 | +| `group` _string_ | Group is the group of the referent. | inference.networking.x-k8s.io | MaxLength: 253 <br />Pattern: `^$\|^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` <br /> | |
| 238 | +| `kind` _string_ | Kind is kind of the referent. For example "InferencePool". | InferencePool | MaxLength: 63 <br />MinLength: 1 <br />Pattern: `^[a-zA-Z]([-a-zA-Z0-9]*[a-zA-Z0-9])?$` <br /> | |
| 239 | +| `name` _string_ | Name is the name of the referent. | | MaxLength: 253 <br />MinLength: 1 <br />Required: \{\} <br /> | |
| 240 | + |
| 241 | + |
| 242 | +#### TargetModel |
| 243 | + |
| 244 | + |
| 245 | + |
| 246 | +TargetModel represents a deployed model or a LoRA adapter. The |
| 247 | +Name field is expected to match the name of the LoRA adapter |
| 248 | +(or base model) as it is registered within the model server. Inference |
| 249 | +Gateway assumes that the model exists on the model server and is the |
| 250 | +responsibility of the user to validate a correct match. Should a model fail |
| 251 | +to exist at request time, the error is processed by the Instance Gateway, |
| 252 | +and then emitted on the appropriate InferenceModel object. |
| 253 | + |
| 254 | + |
| 255 | + |
| 256 | +_Appears in:_ |
| 257 | +- [InferenceModelSpec](#inferencemodelspec) |
| 258 | + |
| 259 | +| Field | Description | Default | Validation | |
| 260 | +| --- | --- | --- | --- | |
| 261 | +| `name` _string_ | The name of the adapter as expected by the ModelServer. | | MaxLength: 253 <br /> | |
| 262 | +| `weight` _integer_ | Weight is used to determine the proportion of traffic that should be<br />sent to this target model when multiple versions of the model are specified. | 1 | Maximum: 1e+06 <br />Minimum: 0 <br /> | |
2 | 263 |
|
3 |
| -This page contains the API field specification for Gateway API. |
4 | 264 |
|
5 |
| -REPLACE_WITH_GENERATED_CONTENT |
|
0 commit comments