Skip to content

Commit 6b2fd93

Browse files
committed
Refactors e2e for manifest approach
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent e8f20a2 commit 6b2fd93

17 files changed

+516
-1625
lines changed

api/v1alpha1/inferencemodel_types.go

-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ import (
2020
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2121
)
2222

23-
const (
24-
// KindInferenceModel is the InferenceModel kind.
25-
KindInferenceModel = "InferenceModel"
26-
// ResourceInferenceModel is the name of the inferencemodels resource.
27-
ResourceInferenceModel = "inferencemodels"
28-
)
29-
3023
// InferenceModel is the Schema for the InferenceModels API.
3124
//
3225
// +kubebuilder:object:root=true

api/v1alpha1/inferencepool_types.go

-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ import (
2020
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2121
)
2222

23-
const (
24-
// KindInferencePool is the InferencePool kind.
25-
KindInferencePool = "InferencePool"
26-
// ResourceInferencePool is the name of the inferencepools resource.
27-
ResourceInferencePool = "inferencepools"
28-
)
29-
3023
// InferencePool is the Schema for the InferencePools API.
3124
//
3225
// +kubebuilder:object:root=true

pkg/README.md

+14-16
Original file line numberDiff line numberDiff line change
@@ -6,29 +6,30 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
66
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
77
- A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running)
88
- For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer
9+
- 3 GPUs to run the vLLM deployment. Adjust the number of replicas as needed.
910

1011
### Steps
1112

12-
1. **Deploy Sample vLLM Application**
13+
1. **Install the Inference Extension CRDs:**
1314

14-
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
15-
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
16-
```bash
17-
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
18-
kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
15+
```sh
16+
kubectl apply -f config/crd/bases
1917
```
2018

21-
1. **Install the CRDs into the cluster:**
19+
1. **Deploy Sample vLLM Application**
20+
21+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
2222

23-
```sh
24-
kubectl apply -f config/crd/bases
23+
Replace `$HF_TOKEN` in `./manifests/vllm/deployment.yaml` with your Hugging Face secret and then deploy the sample vLLM deployment.
24+
```bash
25+
kubectl apply -f ./manifests/vllm/deployment.yaml
2526
```
2627

27-
1. **Deploy InferenceModel and InferencePool**
28+
1. **Deploy InferenceModel**
2829

29-
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
30+
Deploy a sample InferenceModel configuration based on the vLLM deployments mentioned above.
3031
```bash
31-
kubectl apply -f ./manifests/inferencepool-with-model.yaml
32+
kubectl apply -f ./manifests/inferencemodel.yaml
3233
```
3334

3435
1. **Update Envoy Gateway Config to enable Patch Policy**
@@ -46,11 +47,8 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
4647
kubectl apply -f ./manifests/gateway/gateway.yaml
4748
```
4849
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
49-
50-
51-
5250
53-
1. **Deploy Ext-Proc**
51+
1. **Deploy the Inference Extension and InferencePool**
5452

5553
```bash
5654
kubectl apply -f ./manifests/ext_proc.yaml

pkg/crd/install.go

-106
This file was deleted.

pkg/crd/install_test.go

-130
This file was deleted.

0 commit comments

Comments
 (0)