Skip to content

Commit b94fd56

Browse files
committed
Fixes to the adapter rollouts guide
1 parent 88c20f1 commit b94fd56

File tree

4 files changed

+7
-3
lines changed

4 files changed

+7
-3
lines changed

cloudbuild.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ steps:
1212
- GIT_TAG=$_GIT_TAG
1313
- EXTRA_TAG=$_PULL_BASE_REF
1414
- DOCKER_BUILDX_CMD=/buildx-entrypoint
15-
- name: lora-adapter-syncer
15+
- name: gcr.io/k8s-testimages/gcb-docker-gcloud:v20220830-45cbff55bc
1616
entrypoint: make
1717
args:
1818
- syncer-image-push

mkdocs.yml

+1
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ nav:
5656
- Guides:
5757
- User Guides:
5858
- Getting started: guides/index.md
59+
- Adapter Rollout: guides/adapter-rollout.md
5960
- Implementer's Guide: guides/implementers.md
6061
- Reference:
6162
- API Reference: reference/spec.md

site-src/guides/dynamic-lora.md site-src/guides/adapter-rollout.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Getting started with Gateway API Inference Extension with Dynamic lora updates on vllm
1+
# Adapter Rollout
22

33
The goal of this guide is to get a single InferencePool running with vLLM and demonstrate use of dynamic lora updating!
44

@@ -42,6 +42,8 @@ Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs
4242
- base-model: meta-llama/Llama-2-7b-hf
4343
id: tweet-summary-2
4444
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
45+
```
46+
4547
2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
4648
4749
```yaml

site-src/guides/index.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
6868
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
6969
```
7070
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
71+
7172
1. **OPTIONALLY**: Apply Traffic Policy
7273

7374
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
@@ -89,4 +90,4 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
8990
"max_tokens": 100,
9091
"temperature": 0
9192
}'
92-
```
93+
```

0 commit comments

Comments
 (0)