Updating kubectl apply to ref direct files and locking EPP to a specific SHA (#233)

kfswain · web-flow · commit ff9cbe7b5c55 · 2025-01-27T15:45:28.000-08:00
diff --git a/pkg/README.md b/pkg/README.md
@@ -15,35 +15,35 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
    ```bash
    kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
-   kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/vllm-lora-deployment.yaml
    ```
 
 1. **Install the CRDs into the cluster:**
 
    ```sh
-   kubectl apply -f config/crd/bases
+   kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd
    ```
 
 1. **Deploy InferenceModel and InferencePool**
 
    Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
    ```bash
-   kubectl apply -f ./manifests/inferencepool-with-model.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencepool-with-model.yaml
    ```
 
 1. **Update Envoy Gateway Config to enable Patch Policy**
 
    Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
    ```bash
-   kubectl apply -f ./manifests/gateway/enable_patch_policy.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
    kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
    ```
    Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
 
 1. **Deploy Gateway**
 
    ```bash
-   kubectl apply -f ./manifests/gateway/gateway.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
    ```
    > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
    
@@ -53,14 +53,14 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
 1. **Deploy Ext-Proc**
 
    ```bash
-   kubectl apply -f ./manifests/ext_proc.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
    ```
 
 1. **Deploy Envoy Gateway Custom Policies**
 
    ```bash
-   kubectl apply -f ./manifests/gateway/extension_policy.yaml
-   kubectl apply -f ./manifests/gateway/patch_policy.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
    ```
    > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
 
@@ -69,7 +69,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
 
    ```bash
-   kubectl apply -f ./manifests/gateway/traffic_policy.yaml
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
    ```
 
 1. **Try it out**
diff --git a/pkg/manifests/ext_proc.yaml b/pkg/manifests/ext_proc.yaml
@@ -60,7 +60,7 @@ spec:
       containers:
       - name: inference-gateway-ext-proc
         # TODO(https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/34) Update the image and args.
-        image: us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp:main
+        image: us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp@sha256:1ef39ee79c55db6436f9d4cb14cc957dea647f57b8a7ff0363f60b0711695103
         args:
         - -poolName
         - "vllm-llama2-7b-pool"