generated from kubernetes/kubernetes-template-project
-
Notifications
You must be signed in to change notification settings - Fork 69
Adds Initial e2e Tests and Tooling #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,31 +4,34 @@ This quickstart guide is intended for engineers familiar with k8s and model serv | |
|
||
### Requirements | ||
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher | ||
- A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running) | ||
- For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer | ||
- A cluster with: | ||
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind, | ||
you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer). | ||
- 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed. | ||
|
||
### Steps | ||
|
||
1. **Deploy Sample vLLM Application** | ||
1. **Deploy Sample Model Server** | ||
|
||
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model. | ||
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model. | ||
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway. | ||
```bash | ||
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 | ||
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/vllm-lora-deployment.yaml | ||
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml | ||
``` | ||
|
||
1. **Install the CRDs into the cluster:** | ||
1. **Install the Inference Extension CRDs:** | ||
|
||
```sh | ||
kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd | ||
``` | ||
|
||
1. **Deploy InferenceModel and InferencePool** | ||
1. **Deploy InferenceModel** | ||
|
||
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above. | ||
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1` | ||
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server. | ||
```bash | ||
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencepool-with-model.yaml | ||
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml | ||
``` | ||
|
||
1. **Update Envoy Gateway Config to enable Patch Policy** | ||
|
@@ -46,11 +49,15 @@ This quickstart guide is intended for engineers familiar with k8s and model serv | |
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml | ||
``` | ||
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** | ||
|
||
|
||
|
||
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: | ||
```bash | ||
$ kubectl get gateway inference-gateway | ||
NAME CLASS ADDRESS PROGRAMMED AGE | ||
inference-gateway inference-gateway <MY_ADDRESS> True 22s | ||
``` | ||
|
||
1. **Deploy Ext-Proc** | ||
1. **Deploy the Inference Extension and InferencePool** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned above, the InferencePool is now bundled with the ext-proc since the ext-proc is specifically configured for this pool. |
||
|
||
```bash | ||
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
10 changes: 0 additions & 10 deletions
10
pkg/manifests/inferencepool-with-model.yaml → pkg/manifests/inferencemodel.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# End-to-End Tests | ||
|
||
This document provides instructions on how to run the end-to-end tests. | ||
|
||
## Overview | ||
|
||
The end-to-end tests are designed to validate end-to-end Gateway API Inference Extension functionality. These tests are executed against a Kubernetes cluster and use the Ginkgo testing framework to ensure the extension behaves as expected. | ||
|
||
## Prerequisites | ||
|
||
- [Go](https://golang.org/doc/install) installed on your machine. | ||
- [Make](https://www.gnu.org/software/make/manual/make.html) installed to run the end-to-end test target. | ||
- A Hugging Face Hub token with access to the [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) model. | ||
|
||
## Running the End-to-End Tests | ||
|
||
Follow these steps to run the end-to-end tests: | ||
|
||
1. **Clone the Repository**: Clone the `gateway-api-inference-extension` repository: | ||
|
||
```sh | ||
git clone https://github.com/kubernetes-sigs/gateway-api-inference-extension.git && cd gateway-api-inference-extension | ||
``` | ||
|
||
1. **Export Your Hugging Face Hub Token**: The token is required to run the test model server: | ||
|
||
```sh | ||
export HF_TOKEN=<MY_HF_TOKEN> | ||
``` | ||
|
||
1. **Run the Tests**: Run the `test-e2e` target: | ||
|
||
```sh | ||
make test-e2e | ||
``` | ||
|
||
The test suite prints details for each step. Note that the `vllm-llama2-7b-pool` model server deployment | ||
may take several minutes to report an `Available=True` status due to the time required for bootstraping. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the InferencePool CR is now part of the ext-proxy deployment. The InferenceModel had to be split from the InferencePool since e2e tests will create InferenceModels for each test case. The InferenceModel could have its own manifest but IMHO it makes sense to bundle with the ext-proc since the ext-proc is specifically configured for this InferencePool.