From 36cd051975952328c438d2c6e92035f4d83cdde3 Mon Sep 17 00:00:00 2001 From: Kellen Swain Date: Mon, 10 Feb 2025 19:09:59 +0000 Subject: [PATCH 1/4] Link to v0.1.0 getting started guide --- site-src/guides/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 92f6412a..d5e943d4 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -1,3 +1,3 @@ # Getting started with Gateway API Inference Extension -TODO \ No newline at end of file +To get started using our project follow this guide [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/release-v0.1.0/pkg/README.md)! \ No newline at end of file From e9224d8ec319b4a4a7e3a1ec55feb4fd571c1f06 Mon Sep 17 00:00:00 2001 From: Kellen Swain Date: Mon, 10 Feb 2025 19:24:45 +0000 Subject: [PATCH 2/4] Moving getting started guide to the site --- pkg/README.md | 95 +--------------------------------------- site-src/guides/index.md | 95 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 95 insertions(+), 95 deletions(-) diff --git a/pkg/README.md b/pkg/README.md index 04ebfde2..b53ef777 100644 --- a/pkg/README.md +++ b/pkg/README.md @@ -1,96 +1,3 @@ ## Quickstart -This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running! - -### Requirements - - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher - - A cluster with: - - Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind, - you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer). - - 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed. - -### Steps - -1. **Deploy Sample Model Server** - - Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model. - Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway. - ```bash - kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml - ``` - -1. **Install the Inference Extension CRDs:** - - ```sh - kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd - ``` - -1. **Deploy InferenceModel** - - Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1` - [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server. - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml - ``` - -1. **Update Envoy Gateway Config to enable Patch Policy** - - Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml - kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system - ``` - Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. - -1. **Deploy Gateway** - - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml - ``` - > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** - - Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: - ```bash - $ kubectl get gateway inference-gateway - NAME CLASS ADDRESS PROGRAMMED AGE - inference-gateway inference-gateway True 22s - ``` - -1. **Deploy the Inference Extension and InferencePool** - - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml - ``` - -1. **Deploy Envoy Gateway Custom Policies** - - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml - ``` - > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. - -1. **OPTIONALLY**: Apply Traffic Policy - - For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. - - ```bash - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml - ``` - -1. **Try it out** - - Wait until the gateway is ready. - - ```bash - IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') - PORT=8081 - - curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ - "model": "tweet-summary", - "prompt": "Write as if you were a critic: San Francisco", - "max_tokens": 100, - "temperature": 0 - }' - ``` \ No newline at end of file +Please refer to our Getting started guide here: https://gateway-api-inference-extension.sigs.k8s.io/guides/ \ No newline at end of file diff --git a/site-src/guides/index.md b/site-src/guides/index.md index d5e943d4..8a175fdc 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -1,3 +1,96 @@ # Getting started with Gateway API Inference Extension -To get started using our project follow this guide [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/release-v0.1.0/pkg/README.md)! \ No newline at end of file +This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running! + +### Requirements + - Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher + - A cluster with: + - Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind, + you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer). + - 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed. + +### Steps + +1. **Deploy Sample Model Server** + + Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model. + Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway. + ```bash + kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml + ``` + +1. **Install the Inference Extension CRDs:** + + ```sh + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml + ``` + +1. **Deploy InferenceModel** + + Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1` + [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server. + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml + ``` + +1. **Update Envoy Gateway Config to enable Patch Policy** + + Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml + kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system + ``` + Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. + +1. **Deploy Gateway** + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml + ``` + > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** + + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: + ```bash + $ kubectl get gateway inference-gateway + NAME CLASS ADDRESS PROGRAMMED AGE + inference-gateway inference-gateway True 22s + ``` + +1. **Deploy the Inference Extension and InferencePool** + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml + ``` + +1. **Deploy Envoy Gateway Custom Policies** + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml + ``` + > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. + +1. **OPTIONALLY**: Apply Traffic Policy + + For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. + + ```bash + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml + ``` + +1. **Try it out** + + Wait until the gateway is ready. + + ```bash + IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') + PORT=8081 + + curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ + "model": "tweet-summary", + "prompt": "Write as if you were a critic: San Francisco", + "max_tokens": 100, + "temperature": 0 + }' + ``` \ No newline at end of file From f4271cc542b2a9b6c25ad6243c32a4775639e7c5 Mon Sep 17 00:00:00 2001 From: Kellen Swain Date: Mon, 10 Feb 2025 19:27:46 +0000 Subject: [PATCH 3/4] site doesnt support markdown syntax for ordered lists, making explicit --- site-src/guides/index.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 8a175fdc..580993f5 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -20,13 +20,13 @@ This quickstart guide is intended for engineers familiar with k8s and model serv kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml ``` -1. **Install the Inference Extension CRDs:** +2. **Install the Inference Extension CRDs:** ```sh kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml ``` -1. **Deploy InferenceModel** +3. **Deploy InferenceModel** Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1` [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server. @@ -34,7 +34,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml ``` -1. **Update Envoy Gateway Config to enable Patch Policy** +4. **Update Envoy Gateway Config to enable Patch Policy** Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: ```bash @@ -43,7 +43,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv ``` Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. -1. **Deploy Gateway** +5. **Deploy Gateway** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml @@ -57,13 +57,13 @@ This quickstart guide is intended for engineers familiar with k8s and model serv inference-gateway inference-gateway True 22s ``` -1. **Deploy the Inference Extension and InferencePool** +6. **Deploy the Inference Extension and InferencePool** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml ``` -1. **Deploy Envoy Gateway Custom Policies** +7. **Deploy Envoy Gateway Custom Policies** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml @@ -71,7 +71,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv ``` > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. -1. **OPTIONALLY**: Apply Traffic Policy +8. **OPTIONALLY**: Apply Traffic Policy For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. @@ -79,7 +79,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml ``` -1. **Try it out** +9. **Try it out** Wait until the gateway is ready. From 015bbc8e2657c470db4a86d0ca149022487386a6 Mon Sep 17 00:00:00 2001 From: Kellen Swain Date: Mon, 10 Feb 2025 19:40:57 +0000 Subject: [PATCH 4/4] fiddling with mkdocs syntax --- site-src/guides/index.md | 26 +++++++++----------------- 1 file changed, 9 insertions(+), 17 deletions(-) diff --git a/site-src/guides/index.md b/site-src/guides/index.md index 580993f5..e4cbec6f 100644 --- a/site-src/guides/index.md +++ b/site-src/guides/index.md @@ -19,22 +19,19 @@ This quickstart guide is intended for engineers familiar with k8s and model serv kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml ``` - -2. **Install the Inference Extension CRDs:** +1. **Install the Inference Extension CRDs:** ```sh kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml - ``` - -3. **Deploy InferenceModel** + +1. **Deploy InferenceModel** Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1` [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server. ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml ``` - -4. **Update Envoy Gateway Config to enable Patch Policy** +1. **Update Envoy Gateway Config to enable Patch Policy** Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: ```bash @@ -42,8 +39,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system ``` Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. - -5. **Deploy Gateway** +1. **Deploy Gateway** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml @@ -56,30 +52,26 @@ This quickstart guide is intended for engineers familiar with k8s and model serv NAME CLASS ADDRESS PROGRAMMED AGE inference-gateway inference-gateway True 22s ``` - -6. **Deploy the Inference Extension and InferencePool** +1. **Deploy the Inference Extension and InferencePool** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml ``` - -7. **Deploy Envoy Gateway Custom Policies** +1. **Deploy Envoy Gateway Custom Policies** ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml ``` > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. - -8. **OPTIONALLY**: Apply Traffic Policy +1. **OPTIONALLY**: Apply Traffic Policy For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. ```bash kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml ``` - -9. **Try it out** +1. **Try it out** Wait until the gateway is ready.