Skip to content

Commit 836ef57

Browse files
authored
Move getting started guide to docs site (#308)
* Link to v0.1.0 getting started guide * Moving getting started guide to the site * site doesnt support markdown syntax for ordered lists, making explicit * fiddling with mkdocs syntax
1 parent 7149624 commit 836ef57

File tree

2 files changed

+87
-95
lines changed

2 files changed

+87
-95
lines changed

pkg/README.md

+1-94
Original file line numberDiff line numberDiff line change
@@ -1,96 +1,3 @@
11
## Quickstart
22

3-
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
4-
5-
### Requirements
6-
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
7-
- A cluster with:
8-
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,
9-
you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10-
- 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed.
11-
12-
### Steps
13-
14-
1. **Deploy Sample Model Server**
15-
16-
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
17-
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
18-
```bash
19-
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21-
```
22-
23-
1. **Install the Inference Extension CRDs:**
24-
25-
```sh
26-
kubectl apply -k https://github.com/kubernetes-sigs/gateway-api-inference-extension/config/crd
27-
```
28-
29-
1. **Deploy InferenceModel**
30-
31-
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
32-
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
33-
```bash
34-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml
35-
```
36-
37-
1. **Update Envoy Gateway Config to enable Patch Policy**
38-
39-
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
40-
```bash
41-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
42-
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
43-
```
44-
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
45-
46-
1. **Deploy Gateway**
47-
48-
```bash
49-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
50-
```
51-
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
52-
53-
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
54-
```bash
55-
$ kubectl get gateway inference-gateway
56-
NAME CLASS ADDRESS PROGRAMMED AGE
57-
inference-gateway inference-gateway <MY_ADDRESS> True 22s
58-
```
59-
60-
1. **Deploy the Inference Extension and InferencePool**
61-
62-
```bash
63-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
64-
```
65-
66-
1. **Deploy Envoy Gateway Custom Policies**
67-
68-
```bash
69-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
70-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
71-
```
72-
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
73-
74-
1. **OPTIONALLY**: Apply Traffic Policy
75-
76-
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
77-
78-
```bash
79-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
80-
```
81-
82-
1. **Try it out**
83-
84-
Wait until the gateway is ready.
85-
86-
```bash
87-
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
88-
PORT=8081
89-
90-
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
91-
"model": "tweet-summary",
92-
"prompt": "Write as if you were a critic: San Francisco",
93-
"max_tokens": 100,
94-
"temperature": 0
95-
}'
96-
```
3+
Please refer to our Getting started guide here: https://gateway-api-inference-extension.sigs.k8s.io/guides/

site-src/guides/index.md

+86-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,88 @@
11
# Getting started with Gateway API Inference Extension
22

3-
TODO
3+
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
4+
5+
### Requirements
6+
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
7+
- A cluster with:
8+
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,
9+
you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10+
- 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed.
11+
12+
### Steps
13+
14+
1. **Deploy Sample Model Server**
15+
16+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
17+
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
18+
```bash
19+
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21+
```
22+
1. **Install the Inference Extension CRDs:**
23+
24+
```sh
25+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml
26+
27+
1. **Deploy InferenceModel**
28+
29+
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
30+
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
31+
```bash
32+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml
33+
```
34+
1. **Update Envoy Gateway Config to enable Patch Policy**
35+
36+
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
37+
```bash
38+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
39+
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
40+
```
41+
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
42+
1. **Deploy Gateway**
43+
44+
```bash
45+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
46+
```
47+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
48+
49+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
50+
```bash
51+
$ kubectl get gateway inference-gateway
52+
NAME CLASS ADDRESS PROGRAMMED AGE
53+
inference-gateway inference-gateway <MY_ADDRESS> True 22s
54+
```
55+
1. **Deploy the Inference Extension and InferencePool**
56+
57+
```bash
58+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
59+
```
60+
1. **Deploy Envoy Gateway Custom Policies**
61+
62+
```bash
63+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
64+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
65+
```
66+
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
67+
1. **OPTIONALLY**: Apply Traffic Policy
68+
69+
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
70+
71+
```bash
72+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
73+
```
74+
1. **Try it out**
75+
76+
Wait until the gateway is ready.
77+
78+
```bash
79+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
80+
PORT=8081
81+
82+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
83+
"model": "tweet-summary",
84+
"prompt": "Write as if you were a critic: San Francisco",
85+
"max_tokens": 100,
86+
"temperature": 0
87+
}'
88+
```

0 commit comments

Comments
 (0)