File tree 1 file changed +68
-0
lines changed
1 file changed +68
-0
lines changed Original file line number Diff line number Diff line change @@ -19,6 +19,74 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
19
19
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20
20
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21
21
```
22
+ ** OPTIONALLY** : Enable Dynamic loading of Lora adapters.
23
+
24
+ [ Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar] ( https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dynamic-lora-sidecar/deployment.yaml )
25
+
26
+ *** Safely rollout v2 adapter***
27
+
28
+ 1 . Update lora configmap
29
+
30
+ ``` yaml
31
+
32
+ apiVersion : v1
33
+ kind : ConfigMap
34
+ metadata :
35
+ name : dynamic-lora-config
36
+ data :
37
+ configmap.yaml : |
38
+ vLLMLoRAConfig:
39
+ ensureExist:
40
+ models:
41
+ - id: chatbot-v1
42
+ source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
43
+ - id: chatbot-v2
44
+ source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
45
+ ` ` `
46
+
47
+ 2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
48
+
49
+ ` ` ` yaml
50
+ model :
51
+ name : chatbot
52
+ targetModels :
53
+ targetModelName : chatbot-v1
54
+ weight : 90
55
+ targetModelName : chatbot-v2
56
+ weight : 10
57
+ ` ` `
58
+
59
+ 3. Finish rollout by setting the traffic to the new version 100%.
60
+ ` ` ` yaml
61
+ model :
62
+ name : chatbot
63
+ targetModels :
64
+ targetModelName : chatbot-v2
65
+ weight : 100
66
+ ` ` `
67
+
68
+ 4. Remove v1 from dynamic lora configmap.
69
+ ` ` ` yaml
70
+ apiVersion : v1
71
+ kind : ConfigMap
72
+ metadata :
73
+ name : dynamic-lora-config
74
+ data :
75
+ configmap.yaml : |
76
+ vLLMLoRAConfig:
77
+ ensureExist:
78
+ models:
79
+ - id: chatbot-v2
80
+ source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
81
+ ensureNotExist: # Explicitly unregisters the adapter from model servers
82
+ models:
83
+ - id: chatbot-v1
84
+ source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
85
+ ` ` `
86
+
87
+
88
+
89
+
22
90
1. **Install the Inference Extension CRDs:**
23
91
24
92
` ` ` sh
You can’t perform that action at this time.
0 commit comments