Skip to content

Commit 950e036

Browse files
authored
Uses tabs for quickstart model server options (#527)
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent 7fbef9e commit 950e036

File tree

2 files changed

+22
-19
lines changed

2 files changed

+22
-19
lines changed

mkdocs.yml

+3
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ markdown_extensions:
4444
- toc:
4545
permalink: true
4646
- tables
47+
- pymdownx.superfences
48+
- pymdownx.tabbed:
49+
alternate_style: true
4750
nav:
4851
- Overview:
4952
- Introduction: index.md

site-src/guides/index.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -14,34 +14,34 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
1414

1515
### Deploy Sample Model Server
1616

17-
This quickstart guide contains two options for setting up model server:
18-
17+
Two options are supported for running the model server:
18+
1919
1. GPU-based model server.
2020
Requirements: a Hugging Face access token that grants access to the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf).
21-
21+
2222
1. CPU-based model server (not using GPUs).
2323
Requirements: a Hugging Face access token that grants access to the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).
2424

2525
Choose one of these options and follow the steps below. Please do not deploy both, as the deployments have the same name and will override each other.
26-
27-
#### GPU-Based Model Server
2826

29-
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.
30-
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
31-
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
32-
```bash
33-
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
34-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml
35-
```
27+
=== "GPU-Based Model Server"
3628

37-
#### CPU-Based Model Server
29+
For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.
30+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
31+
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
32+
```bash
33+
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
34+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml
35+
```
3836

39-
Create a Hugging Face secret to download the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Ensure that the token grants access to this model.
40-
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
41-
```bash
42-
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
43-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
44-
```
37+
=== "CPU-Based Model Server"
38+
39+
Create a Hugging Face secret to download the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Ensure that the token grants access to this model.
40+
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
41+
```bash
42+
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
43+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
44+
```
4545

4646
### Install the Inference Extension CRDs
4747

0 commit comments

Comments
 (0)