kubernetes-sigs · k8s-ci-robot · Mar 18, 2025 · Mar 18, 2025
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -44,6 +44,9 @@ markdown_extensions:
   - toc:
       permalink: true
   - tables
+  - pymdownx.superfences
+  - pymdownx.tabbed:
+      alternate_style: true
 nav:
   - Overview:
     - Introduction: index.md

diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -14,34 +14,34 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
 
 ### Deploy Sample Model Server
 
-   This quickstart guide contains two options for setting up model server:    
-   
+   Two options are supported for running the model server:
+
    1. GPU-based model server.  
       Requirements: a Hugging Face access token that grants access to the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf).
-   
+
    1. CPU-based model server (not using GPUs).  
       Requirements: a Hugging Face access token that grants access to the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct).  
 
    Choose one of these options and follow the steps below. Please do not deploy both, as the deployments have the same name and will override each other.
-
-#### GPU-Based Model Server
 
-   For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.  
-   Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
-   Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
-   ```bash
-   kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
-   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml
-   ```
+=== "GPU-Based Model Server"
 
-#### CPU-Based Model Server
+      For this setup, you will need 3 GPUs to run the sample model server. Adjust the number of replicas in `./config/manifests/vllm/gpu-deployment.yaml` as needed.
+      Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
+      Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
+      ```bash
+      kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
+      kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml
+      ```
 
-   Create a Hugging Face secret to download the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Ensure that the token grants access to this model.
-   Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
-   ```bash
-   kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
-   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
-   ```
+=== "CPU-Based Model Server"
+
+      Create a Hugging Face secret to download the model [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct). Ensure that the token grants access to this model.
+      Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
+      ```bash
+      kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Qwen
+      kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/cpu-deployment.yaml
+      ```
 
 ### Install the Inference Extension CRDs