You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Getting started with Gateway API Inference Extension with Dynamic lora updates on vllm
2
+
3
+
The goal of this guide is to get a single InferencePool running with vLLM and demonstrate use of dynamic lora updating!
4
+
5
+
### Requirements
6
+
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
7
+
- A cluster with:
8
+
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,
9
+
you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10
+
- 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed.
11
+
12
+
### Steps
13
+
14
+
1.**Deploy Sample VLLM Model Server with dynamic lora update enabled and dynamic lora syncer sidecar **
15
+
[Redeploy the vLLM deployment with Dynamic lora adapter enabled and Lora syncer sidecar and configmap](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/manifests/vllm/dynamic-lora-sidecar/deployment.yaml)
16
+
17
+
Rest of the steps are same as [general setup](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/index.md)
18
+
19
+
20
+
### Safely rollout v2 adapter
21
+
22
+
1. Update the LoRA syncer ConfigMap to make the new adapter version available on the model servers.
2. Configure a canary rollout with traffic split using LLMService. In this example, 40% of traffic for tweet-summary model will be sent to the ***tweet-summary-2*** adapter .
46
+
47
+
```yaml
48
+
model:
49
+
name: tweet-summary
50
+
targetModels:
51
+
targetModelName: tweet-summary-0
52
+
weight: 20
53
+
targetModelName: tweet-summary-1
54
+
weight: 40
55
+
targetModelName: tweet-summary-2
56
+
weight: 40
57
+
58
+
```
59
+
60
+
3. Finish rollout by setting the traffic to the new version 100%.
0 commit comments