Integrate dynamic-lora-sidecar into main guide and add makefile, cloudbuild to build and publish lora-syncer image

coolkp · coolkp · commit cd2fc961326b · 2025-02-10T18:07:15.000-08:00
Signed-off-by: Kunjan &lt;kunjanp@google.com&gt;
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -19,6 +19,74 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
    ```
+   **OPTIONALLY**: Enable Dynamic loading of Lora adapters.
+   
+     [Deploy sample vllm deployment with Dynamic lora adapter enabled and Lora syncer sidecar](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/tools/dynamic-lora-sidecar/deployment.yaml)
+         
+    ***Safely rollout v2 adapter***
+    
+     1. Update lora configmap
+
+        ``` yaml
+
+              apiVersion: v1
+              kind: ConfigMap
+              metadata:
+              name: dynamic-lora-config
+              data:
+              configmap.yaml: |
+                    vLLMLoRAConfig:
+                    ensureExist:   
+                       models:
+                       - id: chatbot-v1
+                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+                       - id: chatbot-v2
+                          source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2     
+         ```
+
+     2. Configure a canary rollout with traffic split using LLMService. In this example, 10% of traffic to the chatbot model will be sent to v2.
+
+        ``` yaml
+        model:
+           name: chatbot
+           targetModels:
+           targetModelName: chatbot-v1
+                 weight: 90
+           targetModelName: chatbot-v2
+                 weight: 10
+        ```
+            
+     3. Finish rollout by setting the traffic to the new version 100%.
+        ```yaml
+        model:
+           name: chatbot
+           targetModels:
+           targetModelName: chatbot-v2
+                 weight: 100
+        ```
+         
+     4. Remove v1 from dynamic lora configmap.
+        ```yaml
+           apiVersion: v1
+           kind: ConfigMap
+           metadata:
+           name: dynamic-lora-config
+           data:
+           configmap.yaml: |
+                 vLLMLoRAConfig:
+                 ensureExist:
+                    models:
+                    - id: chatbot-v2
+                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v2
+                 ensureNotExist: # Explicitly unregisters the adapter from  model servers
+                    models:
+                    - id: chatbot-v1
+                       source: gs://[TEAM-A-MODELS-BUCKET]/chatbot-v1
+        ```
+
+
+
+
 1. **Install the Inference Extension CRDs:**
 
    ```sh