You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates (#78)
* squashed modify filter for LoRA affinity
modify filter for LoRA affinity
* update llm service and llm server pool yaml, readme
* remove ununsed method from metrics.go
* add flowchart image
* update size flowchart image
* remove image name
* update queueingThresholdLoRA to 50
* roll back manifest changes
* roll back manifest changes
* update filter and scheduler based on comments
* rename filters
* update filter names and comments
* fix readme
* fix comment
* modify flowchart
* add comment to lowLoRACostPredicate reasoning when it can be useful
Copy file name to clipboardExpand all lines: pkg/README.md
+15-4
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,12 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
7
7
8
8
1.**Deploy Sample vLLM Application**
9
9
10
-
A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found [here](https://github.com/kubernetes-sigs/llm-instance-gateway/blob/6f9869d6595d2d0f8e6febcbec0f348cb44a3012/examples/poc/manifests/samples/vllm-lora-deployment.yaml#L18).
10
+
A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/vllm/vllm-lora-deployment.yaml#L18).
11
+
12
+
1.**Deploy LLM Service and LLMServerPool**
13
+
14
+
You can find a sample LLM service and LLMServerPool configuration, based on the vLLM deployments mentioned above, [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/llmservice.yaml).
15
+
11
16
12
17
1.**Update Envoy Gateway Config to enable Patch Policy**
13
18
@@ -32,14 +37,13 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
32
37
kubectl apply -f ./manifests/ext_proc.yaml
33
38
kubectl apply -f ./manifests/patch_policy.yaml
34
39
```
35
-
**NOTE**: Ensure the `instance-gateway-ext-proc` deployment is updated with the pod names and internal IP addresses of the vLLM replicas. This step is crucial for the correct routing of requests based on headers. This won't be needed once we make ext proc dynamically read the pods.
36
40
37
41
1.**Try it out**
38
42
39
43
Wait until the gateway is ready.
40
44
41
45
```bash
42
-
IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}')
46
+
IP=$(kubectl get gateway/instance-gateway -o jsonpath='{.status.addresses[0].value}')
@@ -48,4 +52,11 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
48
52
"max_tokens": 100,
49
53
"temperature": 0
50
54
}'
51
-
```
55
+
```
56
+
57
+
58
+
## Scheduling Package in Ext Proc
59
+
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
0 commit comments