Skip to content

Commit cd2fc96

Browse files
committed
Integrate dynamic-lora-sidecar into main guide and add makefile, cloudbuild to build and publish lora-syncer image
Signed-off-by: Kunjan <[email protected]>
1 parent 6c22d92 commit cd2fc96

File tree

1 file changed

+68
-0
lines changed

1 file changed

+68
-0
lines changed

site-src/guides/index.md

+68
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,74 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
1919
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
2020
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
2121
```
22+
**OPTIONALLY**: Enable Dynamic loading of Lora adapters.
23+
24+
[Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dynamic-lora-sidecar/deployment.yaml)
25+
26+
***Safely rollout v2 adapter***
27+
28+
1. Update lora configmap
29+
30+
``` yaml
31+
32+
apiVersion: v1
33+
kind: ConfigMap
34+
metadata:
35+
name: dynamic-lora-config
36+
data:
37+
configmap.yaml: |
38+
vLLMLoRAConfig:
39+
ensureExist:
40+
models:
41+
- id: chatbot-v1
42+
source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
43+
- id: chatbot-v2
44+
source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
45+
```
46+
47+
2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
48+
49+
``` yaml
50+
model:
51+
name: chatbot
52+
targetModels:
53+
targetModelName: chatbot-v1
54+
weight: 90
55+
targetModelName: chatbot-v2
56+
weight: 10
57+
```
58+
59+
3. Finish rollout by setting the traffic to the new version 100%.
60+
```yaml
61+
model:
62+
name: chatbot
63+
targetModels:
64+
targetModelName: chatbot-v2
65+
weight: 100
66+
```
67+
68+
4. Remove v1 from dynamic lora configmap.
69+
```yaml
70+
apiVersion: v1
71+
kind: ConfigMap
72+
metadata:
73+
name: dynamic-lora-config
74+
data:
75+
configmap.yaml: |
76+
vLLMLoRAConfig:
77+
ensureExist:
78+
models:
79+
- id: chatbot-v2
80+
source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
81+
ensureNotExist: # Explicitly unregisters the adapter from model servers
82+
models:
83+
- id: chatbot-v1
84+
source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
85+
```
86+
87+
88+
89+
2290
1. **Install the Inference Extension CRDs:**
2391
2492
```sh

0 commit comments

Comments
 (0)