File tree 5 files changed +16
-39
lines changed
tools/dynamic-lora-sidecar
5 files changed +16
-39
lines changed Original file line number Diff line number Diff line change @@ -19,7 +19,7 @@ This project is [alpha (0.2 release)](https://github.com/kubernetes-sigs/gateway
19
19
20
20
## Getting Started
21
21
22
- Follow our [ Getting Started Guide] ( ./pkg/README.md ) to get the inference-extension up and running on your cluster!
22
+ Follow our [ Getting Started Guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ ) to get the inference-extension up and running on your cluster!
23
23
24
24
See our website at https://gateway-api-inference-extension.sigs.k8s.io/ for detailed API documentation on leveraging our Kubernetes-native declarative APIs
25
25
Original file line number Diff line number Diff line change 1
1
apiVersion : inference.networking.x-k8s.io/v1alpha2
2
2
kind : InferenceModel
3
3
metadata :
4
- name : tweet-summarizer
4
+ name : food-review
5
5
spec :
6
6
modelName : food-review
7
7
criticality : Standard
Original file line number Diff line number Diff line change @@ -235,12 +235,12 @@ spec:
235
235
emptyDir : {}
236
236
- name : config-volume
237
237
configMap :
238
- name : vllm-llama3.1 -8b-adapters
238
+ name : vllm-llama3-8b-adapters
239
239
---
240
240
apiVersion : v1
241
241
kind : ConfigMap
242
242
metadata :
243
- name : vllm-llama3.1 -8b-adapters
243
+ name : vllm-llama3-8b-adapters
244
244
data :
245
245
configmap.yaml : |
246
246
vLLMLoRAConfig:
Original file line number Diff line number Diff line change @@ -37,9 +37,9 @@ Change the ConfigMap to match the following (note the new entry under models):
37
37
ensureExist:
38
38
models:
39
39
- id: food-review-1
40
- source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
40
+ source: Kawon/llama3.1-food-finetune_v14_r8
41
41
- id: food-review-2
42
- source: mahimairaja/tweet-summarization-llama-2-finetuned
42
+ source: Kawon/llama3.1-food-finetune_v14_r8
43
43
` ` `
44
44
45
45
The new adapter version is applied to the model servers live, without requiring a restart.
@@ -121,11 +121,11 @@ Unload the older versions from the servers by updating the LoRA syncer ConfigMap
121
121
ensureExist:
122
122
models:
123
123
- id: food-review-2
124
- source: mahimairaja/tweet-summarization-llama-2-finetuned
124
+ source: Kawon/llama3.1-food-finetune_v14_r8
125
125
ensureNotExist:
126
126
models:
127
127
- id: food-review-1
128
- source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
128
+ source: Kawon/llama3.1-food-finetune_v14_r8
129
129
` ` `
130
130
131
131
With this, all requests should be served by the new adapter version.
Original file line number Diff line number Diff line change @@ -77,50 +77,27 @@ The sidecar supports the following command-line arguments:
77
77
78
78
# # Example Configuration
79
79
80
- Here's an example of using the `defaultBaseModel` field to avoid repetition in your configuration :
80
+ In this example, both adapters will use `meta-llama/Llama-3.1-8B-Instruct` as their base model :
81
81
82
82
` ` ` yaml
83
83
apiVersion: v1
84
84
kind: ConfigMap
85
85
metadata:
86
- name: vllm-llama2-7b -adapters
86
+ name: vllm-llama3-8b -adapters
87
87
data:
88
88
configmap.yaml: |
89
89
vLLMLoRAConfig:
90
- name: vllm-llama2-7b
90
+ name: vllm-llama3-8b
91
91
port: 8000
92
- defaultBaseModel: meta-llama/Llama-2-7b-hf
92
+ defaultBaseModel: meta-llama/Llama-3.1-8B-Instruct
93
93
ensureExist:
94
94
models:
95
- - id: tweet-summary -1
96
- source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
97
- - id: tweet-summary -2
98
- source: mahimairaja/tweet-summarization-llama-2-finetuned
95
+ - id: food-review -1
96
+ source: Kawon/llama3.1-food-finetune_v14_r8
97
+ - id: food-review -2
98
+ source: Kawon/llama3.1-food-finetune_v14_r8
99
99
` ` `
100
100
101
- In this example, both adapters will use `meta-llama/Llama-2-7b-hf` as their base model without needing to specify it for each adapter individually.
102
-
103
- You can still override the default base model for specific adapters when needed :
104
-
105
- ` ` ` yaml
106
- apiVersion: v1
107
- kind: ConfigMap
108
- metadata:
109
- name: vllm-mixed-adapters
110
- data:
111
- configmap.yaml: |
112
- vLLMLoRAConfig:
113
- name: vllm-mixed
114
- port: 8000
115
- defaultBaseModel: meta-llama/Llama-2-7b-hf
116
- ensureExist:
117
- models:
118
- - id: tweet-summary-1
119
- source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
120
- - id: code-assistant
121
- source: huggingface/code-assistant-lora
122
- base-model: meta-llama/Llama-2-13b-hf # Override for this specific adapter
123
- ` ` `
124
101
# # Example Deployment
125
102
126
103
The [deployment.yaml](deployment.yaml) file shows an example of deploying the sidecar with custom parameters :
You can’t perform that action at this time.
0 commit comments