generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 88
Dynamic lora load/unload sidecar #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
k8s-ci-robot
merged 34 commits into
kubernetes-sigs:main
from
coolkp:dynamic-lora-sidecar
Nov 18, 2024
Merged
Changes from 11 commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
14e4b10
Dynamic lora load/unload sidecar
coolkp bcfee4a
Formatting
coolkp cb45fe2
Resolve README comments
coolkp 62da988
Address comments on sidecar, store updates in memory, rename base field
coolkp 56cffc2
Address comments in example deployment
coolkp 5cbaeef
Address comments in example deployment
coolkp 5a03f98
base model is optional
coolkp 1af2df4
Check health of server before querying
coolkp 5b51182
Check health of server before querying
coolkp cc1e686
Docstrings
coolkp 926a71c
Mock health check in tests
coolkp cb3c9b2
Refactor configmap, switch to watchfiles to detect symbolic link targ…
coolkp 3140610
Refactor configmap, switch to watchfiles to detect symbolic link targ…
coolkp 65cea88
Modify unittests
coolkp 8012ea3
Change example host and port to be explicit
coolkp ba00b85
Change example sidecar name
coolkp c8d9c10
Add warning about using subPath
coolkp 828348d
Add screenshots
coolkp ec40820
Add screenshots
coolkp 1aba325
Add testing results
coolkp b30051a
Add testing results
coolkp c5d2527
Add config validation
coolkp d0d01e1
Add config documentation
coolkp b4867b6
Add config documentation
coolkp e60b434
Add config validation
coolkp bea4068
Add config validation
coolkp 100f636
Make reconciling non blocking
coolkp c24ff35
Move under tools
coolkp 472b545
Move under tools
coolkp 5354a47
Document usage of sidecar, available by default from 1.29
coolkp bc2ce32
Document usage of sidecar, available by default from 1.29
coolkp f82d8b2
Document usage of sidecar, available by default from 1.29
coolkp e01ec51
Update tools/dynamic-lora-sidecar/README.md
coolkp 28779e7
Update tools/dynamic-lora-sidecar/README.md
coolkp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sidecar/__pycache__/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
|
||
FROM python:3.10-slim-buster | ||
|
||
WORKDIR /dynamic-lora-reconciler | ||
|
||
RUN python3 -m venv /opt/venv | ||
|
||
ENV PATH="/opt/venv/bin:$PATH" | ||
|
||
RUN pip install --upgrade pip | ||
COPY requirements.txt . | ||
RUN pip install --no-cache-dir -r requirements.txt | ||
|
||
COPY sidecar/sidecar.py . | ||
|
||
CMD ["python", "sidecar.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Dynamic LORA Adapter Sidecar for vLLM | ||
|
||
This directory contains a configmap containing lora adapters configurations and script for a sidecar container to dynamically manage LORA adapters for a vLLM server running in the same Kubernetes pod by reconciling it with a configmap containing lora adapters. | ||
|
||
## Overview | ||
|
||
The sidecar continuously monitors a ConfigMap mounted as a YAML configuration file. This file defines the desired state of LORA adapters, including: | ||
|
||
- **Adapter ID:** Unique identifier for the adapter. | ||
- **Source:** Path to the adapter's source files. | ||
- **Base Model:** The base model to which the adapter should be applied. | ||
- **toRemove:** (Optional) Indicates whether the adapter should be unloaded. | ||
|
||
The sidecar uses the vLLM server's API to load or unload adapters based on the configuration. It also periodically reconciles the registered adapters on the vLLM server with the desired state defined in the ConfigMap, ensuring consistency. | ||
|
||
## Features | ||
|
||
- **Dynamic Loading and Unloading:** Load and unload LORA adapters without restarting the vLLM server. | ||
- **Continuous Reconciliation:** Ensures the vLLM server's state matches the desired configuration. | ||
- **ConfigMap Integration:** Leverages Kubernetes ConfigMaps for easy configuration management. | ||
- **Easy Deployment:** Provides a sample deployment YAML for quick setup. | ||
|
||
## Repository Contents | ||
|
||
- **`sidecar.py`:** Python script for the sidecar container. | ||
- **`Dockerfile`:** Dockerfile to build the sidecar image. | ||
- **`configmap.yaml`:** Example ConfigMap YAML file. | ||
- **`deployment.yaml`:** Example Kubernetes deployment YAML. | ||
|
||
## Usage | ||
|
||
1. **Build the Docker Image:** | ||
```bash | ||
docker build -t <your-image-name> . | ||
2. **Create a configmap:** | ||
```bash | ||
kubectl create configmap name-of-your-configmap --from-file=your-file.yaml | ||
3. **Mount the configmap and configure sidecar in your pod** | ||
![example deployment][deployment] | ||
|
||
[deployment]: deployment.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: llama-deployment | ||
spec: | ||
replicas: 1 | ||
selector: | ||
matchLabels: | ||
app: llama-server | ||
template: | ||
metadata: | ||
labels: | ||
app: llama-server | ||
ai.gke.io/model: LLaMA2_7B | ||
ai.gke.io/inference-server: vllm | ||
examples.ai.gke.io/source: model-garden | ||
spec: | ||
shareProcessNamespace: true | ||
containers: | ||
- name: inference-server | ||
image: vllm/vllm-openai:v0.6.3.post1 | ||
resources: | ||
requests: | ||
cpu: 5 | ||
memory: 20Gi | ||
ephemeral-storage: 40Gi | ||
nvidia.com/gpu : 1 | ||
limits: | ||
cpu: 5 | ||
memory: 20Gi | ||
ephemeral-storage: 40Gi | ||
nvidia.com/gpu : 1 | ||
command: ["/bin/sh", "-c"] | ||
args: | ||
- vllm serve meta-llama/Llama-2-7b-hf | ||
- --host=0.0.0.0 | ||
- --port=8000 | ||
- --tensor-parallel-size=1 | ||
- --swap-space=16 | ||
- --gpu-memory-utilization=0.95 | ||
- --max-model-len=2048 | ||
- --max-num-batched-tokens=4096 | ||
- --disable-log-stats | ||
- --enable-loras | ||
- --max-loras=5 | ||
env: | ||
- name: DEPLOY_SOURCE | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
value: UI_NATIVE_MODEL | ||
- name: MODEL_ID | ||
value: "Llama2-7B" | ||
- name: AIP_STORAGE_URI | ||
value: "gs://vertex-model-garden-public-us/llama2/llama2-7b-hf" | ||
- name: VLLM_ALLOW_RUNTIME_LORA_UPDATING | ||
value: "true" | ||
- name: HF_TOKEN | ||
valueFrom: | ||
secretKeyRef: | ||
name: hf-token # The name of your Kubernetes Secret | ||
key: HF_TOKEN # The specific key within the Secret | ||
- name: DYNAMIC_LORA_ROLLOUT_CONFIG | ||
value: "/config/configmap.yaml" | ||
volumeMounts: | ||
- mountPath: /dev/shm | ||
name: dshm | ||
initContainers: | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- name: configmap-reader-1 | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
image: us-docker.pkg.dev/kunjanp-gke-dev-2/lora-sidecar/sidecar:latest | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
restartPolicy: Always | ||
env: | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DYNAMIC_LORA_ROLLOUT_CONFIG: "/config/configmap.yaml" | ||
volumeMounts: | ||
- name: config-volume | ||
mountPath: /config/configmap.yaml | ||
subPath: configmap.yaml | ||
volumes: | ||
- name: dshm | ||
emptyDir: | ||
medium: Memory | ||
- name: config-volume | ||
configMap: | ||
name: dynamic-lora-config | ||
nodeSelector: | ||
cloud.google.com/gke-accelerator: nvidia-l4 | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: llama-service | ||
spec: | ||
selector: | ||
app: llama-server | ||
type: ClusterIP | ||
ports: | ||
- protocol: TCP | ||
port: 8000 | ||
targetPort: 8000 | ||
|
||
--- | ||
|
||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: dynamic-lora-config | ||
data: | ||
configmap.yaml: | | ||
vLLMLoRAConfig: | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
host: localhost | ||
models: | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- base-model: meta-llama/Llama-2-7b-hf | ||
id: sql-lora-v1 | ||
source: yard1/llama-2-7b-sql-lora-test | ||
name: sql-lora |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
aiohttp==3.10.10 | ||
pyyaml==6.0.2 | ||
requests==2.32.3 | ||
watchdog==5.0.3 |
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
vLLMLoRAConfig: | ||
host: localhost | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
models: | ||
- base-model: meta-llama/Llama-2-7b-hf | ||
id: sql-lora-v1 | ||
source: yard1/llama-2-7b-sql-lora-test | ||
status: | ||
errors: | ||
- '' | ||
operation: load | ||
timestamp: 2024-10-23 15:43:07 UTC+0000 | ||
toRemove: false | ||
- base-model: meta-llama/Llama-2-7b-hf | ||
id: sql-lora-v2 | ||
source: yard1/llama-2-7b-sql-lora-test | ||
status: | ||
errors: | ||
- already unloaded | ||
operation: unload | ||
timestamp: 2024-10-23 15:43:07 UTC+0000 | ||
toRemove: true | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
name: sql-loras-llama | ||
port: '8000' | ||
coolkp marked this conversation as resolved.
Show resolved
Hide resolved
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.