Skip to content

Commit 4ff391b

Browse files
authored
Various fixes to docs and example manifests names (#613)
1 parent 79fedb5 commit 4ff391b

File tree

5 files changed

+16
-39
lines changed

5 files changed

+16
-39
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This project is [alpha (0.2 release)](https://github.com/kubernetes-sigs/gateway
1919

2020
## Getting Started
2121

22-
Follow our [Getting Started Guide](./pkg/README.md) to get the inference-extension up and running on your cluster!
22+
Follow our [Getting Started Guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/) to get the inference-extension up and running on your cluster!
2323

2424
See our website at https://gateway-api-inference-extension.sigs.k8s.io/ for detailed API documentation on leveraging our Kubernetes-native declarative APIs
2525

config/manifests/inferencemodel.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: inference.networking.x-k8s.io/v1alpha2
22
kind: InferenceModel
33
metadata:
4-
name: tweet-summarizer
4+
name: food-review
55
spec:
66
modelName: food-review
77
criticality: Standard

config/manifests/vllm/gpu-deployment.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -235,12 +235,12 @@ spec:
235235
emptyDir: {}
236236
- name: config-volume
237237
configMap:
238-
name: vllm-llama3.1-8b-adapters
238+
name: vllm-llama3-8b-adapters
239239
---
240240
apiVersion: v1
241241
kind: ConfigMap
242242
metadata:
243-
name: vllm-llama3.1-8b-adapters
243+
name: vllm-llama3-8b-adapters
244244
data:
245245
configmap.yaml: |
246246
vLLMLoRAConfig:

site-src/guides/adapter-rollout.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ Change the ConfigMap to match the following (note the new entry under models):
3737
ensureExist:
3838
models:
3939
- id: food-review-1
40-
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
40+
source: Kawon/llama3.1-food-finetune_v14_r8
4141
- id: food-review-2
42-
source: mahimairaja/tweet-summarization-llama-2-finetuned
42+
source: Kawon/llama3.1-food-finetune_v14_r8
4343
```
4444
4545
The new adapter version is applied to the model servers live, without requiring a restart.
@@ -121,11 +121,11 @@ Unload the older versions from the servers by updating the LoRA syncer ConfigMap
121121
ensureExist:
122122
models:
123123
- id: food-review-2
124-
source: mahimairaja/tweet-summarization-llama-2-finetuned
124+
source: Kawon/llama3.1-food-finetune_v14_r8
125125
ensureNotExist:
126126
models:
127127
- id: food-review-1
128-
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
128+
source: Kawon/llama3.1-food-finetune_v14_r8
129129
```
130130

131131
With this, all requests should be served by the new adapter version.

tools/dynamic-lora-sidecar/README.md

+8-31
Original file line numberDiff line numberDiff line change
@@ -77,50 +77,27 @@ The sidecar supports the following command-line arguments:
7777

7878
## Example Configuration
7979

80-
Here's an example of using the `defaultBaseModel` field to avoid repetition in your configuration:
80+
In this example, both adapters will use `meta-llama/Llama-3.1-8B-Instruct` as their base model:
8181

8282
```yaml
8383
apiVersion: v1
8484
kind: ConfigMap
8585
metadata:
86-
name: vllm-llama2-7b-adapters
86+
name: vllm-llama3-8b-adapters
8787
data:
8888
configmap.yaml: |
8989
vLLMLoRAConfig:
90-
name: vllm-llama2-7b
90+
name: vllm-llama3-8b
9191
port: 8000
92-
defaultBaseModel: meta-llama/Llama-2-7b-hf
92+
defaultBaseModel: meta-llama/Llama-3.1-8B-Instruct
9393
ensureExist:
9494
models:
95-
- id: tweet-summary-1
96-
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
97-
- id: tweet-summary-2
98-
source: mahimairaja/tweet-summarization-llama-2-finetuned
95+
- id: food-review-1
96+
source: Kawon/llama3.1-food-finetune_v14_r8
97+
- id: food-review-2
98+
source: Kawon/llama3.1-food-finetune_v14_r8
9999
```
100100

101-
In this example, both adapters will use `meta-llama/Llama-2-7b-hf` as their base model without needing to specify it for each adapter individually.
102-
103-
You can still override the default base model for specific adapters when needed:
104-
105-
```yaml
106-
apiVersion: v1
107-
kind: ConfigMap
108-
metadata:
109-
name: vllm-mixed-adapters
110-
data:
111-
configmap.yaml: |
112-
vLLMLoRAConfig:
113-
name: vllm-mixed
114-
port: 8000
115-
defaultBaseModel: meta-llama/Llama-2-7b-hf
116-
ensureExist:
117-
models:
118-
- id: tweet-summary-1
119-
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
120-
- id: code-assistant
121-
source: huggingface/code-assistant-lora
122-
base-model: meta-llama/Llama-2-13b-hf # Override for this specific adapter
123-
```
124101
## Example Deployment
125102

126103
The [deployment.yaml](deployment.yaml) file shows an example of deploying the sidecar with custom parameters:

0 commit comments

Comments
 (0)