Skip to content

Commit 953235d

Browse files
committed
Refactors e2e for manifest approach
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent c008a95 commit 953235d

17 files changed

+512
-1617
lines changed

api/v1alpha1/inferencemodel_types.go

-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ import (
2020
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2121
)
2222

23-
const (
24-
// KindInferenceModel is the InferenceModel kind.
25-
KindInferenceModel = "InferenceModel"
26-
// ResourceInferenceModel is the name of the inferencemodels resource.
27-
ResourceInferenceModel = "inferencemodels"
28-
)
29-
3023
// InferenceModel is the Schema for the InferenceModels API.
3124
//
3225
// +kubebuilder:object:root=true

api/v1alpha1/inferencepool_types.go

-7
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ import (
2020
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
2121
)
2222

23-
const (
24-
// KindInferencePool is the InferencePool kind.
25-
KindInferencePool = "InferencePool"
26-
// ResourceInferencePool is the name of the inferencepools resource.
27-
ResourceInferencePool = "inferencepools"
28-
)
29-
3023
// InferencePool is the Schema for the InferencePools API.
3124
//
3225
// +kubebuilder:object:root=true

examples/poc/manifests/inferencepool-with-model.yaml renamed to examples/poc/manifests/inferencemodel.yaml

-10
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,4 @@
11
apiVersion: inference.networking.x-k8s.io/v1alpha1
2-
kind: InferencePool
3-
metadata:
4-
labels:
5-
name: vllm-llama2-7b-pool
6-
spec:
7-
targetPortNumber: 8000
8-
selector:
9-
app: vllm-llama2-7b-pool
10-
---
11-
apiVersion: inference.networking.x-k8s.io/v1alpha1
122
kind: InferenceModel
133
metadata:
144
labels:

examples/poc/manifests/vllm/vllm-lora-deployment.yaml

+9-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,13 @@
11
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: hf-token
5+
labels:
6+
app: vllm
7+
stringData:
8+
token: $HF_TOKEN
9+
---
10+
apiVersion: v1
211
kind: Service
312
metadata:
413
name: vllm-llama2-7b-pool
@@ -10,14 +19,11 @@ spec:
1019
port: 8000
1120
targetPort: 8000
1221
type: ClusterIP
13-
1422
---
15-
1623
apiVersion: apps/v1
1724
kind: Deployment
1825
metadata:
1926
name: vllm-llama2-7b-pool
20-
namespace: default
2127
spec:
2228
replicas: 3
2329
selector:

pkg/README.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,26 @@
11
## Quickstart
22

33
### Requirements
4-
The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
4+
5+
- The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
6+
- 3 GPUs are required to run the vLLM deployment. Adjust the number of replicas as needed.
57

68
### Steps
79

810
1. **Deploy Sample vLLM Application**
911

10-
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
11-
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
12+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
13+
14+
Replace `$HF_TOKEN` in `../examples/poc/manifests/vllm/vllm-lora-deployment.yaml` with your Hugging Face secret and then deploy the sample vLLM deployment.
1215
```bash
13-
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
1416
kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
1517
```
1618

17-
1. **Deploy InferenceModel and InferencePool**
19+
1. **Deploy InferenceModel**
1820

19-
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
21+
Deploy a sample InferenceModel configuration based on the vLLM deployments mentioned above.
2022
```bash
21-
kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
23+
kubectl apply -f ../examples/poc/manifests/inferencemodel.yaml
2224
```
2325

2426
1. **Update Envoy Gateway Config to enable Patch Policy**
@@ -36,7 +38,7 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
3638
kubectl apply -f ./manifests/gateway.yaml
3739
```
3840

39-
1. **Deploy Ext-Proc**
41+
1. **Deploy Ext-Proc and InferencePool**
4042

4143
```bash
4244
kubectl apply -f ./manifests/ext_proc.yaml

pkg/crd/install.go

-106
This file was deleted.

pkg/crd/install_test.go

-130
This file was deleted.

0 commit comments

Comments
 (0)