Skip to content

Commit fafbfec

Browse files
committed
Addressed comments
1 parent 675fd47 commit fafbfec

File tree

4 files changed

+19
-19
lines changed

4 files changed

+19
-19
lines changed

pkg/manifests/inferencemodel.yaml

-7
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,11 @@
11
apiVersion: inference.networking.x-k8s.io/v1alpha1
22
kind: InferenceModel
33
metadata:
4-
labels:
5-
app.kubernetes.io/name: api
6-
app.kubernetes.io/managed-by: kustomize
74
name: inferencemodel-sample
85
spec:
96
modelName: tweet-summary
107
criticality: Critical
118
poolRef:
12-
# this is the default val:
13-
group: inference.networking.x-k8s.io
14-
# this is the default val:
15-
kind: InferencePool
169
name: vllm-llama2-7b-pool
1710
targetModels:
1811
- name: tweet-summary-1

pkg/manifests/vllm/deployment.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ spec:
8888
env:
8989
- name: DYNAMIC_LORA_ROLLOUT_CONFIG
9090
value: "/config/configmap.yaml"
91-
volumeMounts: # DO NOT USE subPath
91+
volumeMounts: # DO NOT USE subPath, dynamic configmap updates don't work on subPaths
9292
- name: config-volume
9393
mountPath: /config
9494
restartPolicy: Always

site-src/guides/adapter-rollout.md

+17-10
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The goal of this guide is to demonstrate how to rollout a new adapter version.
44

5-
## **Requirements**
5+
## **Prerequisites**
66

77
Follow the steps in the [main guide](index.md)
88

@@ -52,20 +52,27 @@ Modify the InferenceModel to configure a canary rollout with traffic splitting.
5252
5353
5454
```bash
55-
kubectl edit configmap tweet-summary
55+
kubectl edit inferencemodel tweet-summary
5656
```
5757

58-
Change the InferenceModel to match the following:
58+
Change the targetModels list in InferenceModel to a the following:
5959

6060

6161
```yaml
62-
model:
63-
name: tweet-summary
64-
targetModels:
65-
targetModelName: tweet-summary-1
66-
weight: 90
67-
targetModelName: tweet-summary-2
68-
weight: 10
62+
apiVersion: inference.networking.x-k8s.io/v1alpha1
63+
kind: InferenceModel
64+
metadata:
65+
name: inferencemodel-sample
66+
spec:
67+
modelName: tweet-summary
68+
criticality: Critical
69+
poolRef:
70+
name: vllm-llama2-7b-pool
71+
targetModels:
72+
- name: tweet-summary-1
73+
weight: 90
74+
- name: tweet-summary-2
75+
weight: 10
6976

7077
```
7178

site-src/guides/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
44

5-
## **Requirements**
5+
## **Prerequisites**
66
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
77
- A cluster with:
88
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,

0 commit comments

Comments
 (0)