File tree 4 files changed +7
-3
lines changed
4 files changed +7
-3
lines changed Original file line number Diff line number Diff line change 12
12
- GIT_TAG=$_GIT_TAG
13
13
- EXTRA_TAG=$_PULL_BASE_REF
14
14
- DOCKER_BUILDX_CMD=/buildx-entrypoint
15
- - name : lora-adapter-syncer
15
+ - name : gcr.io/k8s-testimages/gcb-docker-gcloud:v20220830-45cbff55bc
16
16
entrypoint : make
17
17
args :
18
18
- syncer-image-push
Original file line number Diff line number Diff line change 56
56
- Guides :
57
57
- User Guides :
58
58
- Getting started : guides/index.md
59
+ - Adapter Rollout : guides/adapter-rollout.md
59
60
- Implementer's Guide : guides/implementers.md
60
61
- Reference :
61
62
- API Reference : reference/spec.md
Original file line number Diff line number Diff line change 1
- # Getting started with Gateway API Inference Extension with Dynamic lora updates on vllm
1
+ # Adapter Rollout
2
2
3
3
The goal of this guide is to get a single InferencePool running with vLLM and demonstrate use of dynamic lora updating!
4
4
@@ -42,6 +42,8 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
42
42
- base-model: meta-llama/Llama-2-7b-hf
43
43
id: tweet-summary-2
44
44
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
45
+ ` ` `
46
+
45
47
2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
46
48
47
49
` ` ` yaml
Original file line number Diff line number Diff line change @@ -68,6 +68,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
68
68
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
69
69
` ` `
70
70
> ** _NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
71
+
71
72
1. ** OPTIONALLY** : Apply Traffic Policy
72
73
73
74
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
@@ -89,4 +90,4 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
89
90
"max_tokens": 100,
90
91
"temperature": 0
91
92
}'
92
- ` ` `
93
+ ` ` `
You can’t perform that action at this time.
0 commit comments