Skip to content

Commit 0f5f38e

Browse files
committed
Polish the epp README.md file
1 parent 2577f63 commit 0f5f38e

File tree

2 files changed

+16
-5
lines changed

2 files changed

+16
-5
lines changed

pkg/epp/README.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# The EndPoint Picker (EPP)
2+
This pkg hosts the reference implementation of the Endpoint Picker, or EPP. It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol) which defines how a proxy can callout to an extension to request for endpoint hints. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment).
3+
4+
The EPP does the following:
5+
6+
- Picks the endpoint that the LB should route the request to. It picks from the set of ready Pods selected by the assigned `InferencePool`. It processes requests only when the ModelName matches an `InferenceModel` referencing the assigned `InferencePool`. Unmatched requests result in an error sent to the proxy.". The algorithm for picking the endpoint is described below.
7+
8+
- Traffic splitting between adapter versions. `InferenceModel` allows for defining traffic splitting between adapters in the same InferencePool to allow for rollouts of new adapter versions.
9+
10+
- Emitting metrics for observability: EPP will report InferenceModel level metrics broken down by target model as well, the details are at GKE Inference Gateway Observability Proposal.
11+
12+
## The scheduling algorithm
13+
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request. The following flow chart summarizes the current scheduling algorithm
14+
15+
# Flowchart
16+
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />

pkg/scheduling.md

-5
This file was deleted.

0 commit comments

Comments
 (0)