Skip to content

Commit fa4a77d

Browse files
committed
Addressed comments
1 parent 9f73d52 commit fa4a77d

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

pkg/epp/README.md

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,17 @@
11
# The EndPoint Picker (EPP)
2-
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment).
2+
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. An EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment).
33

44

55
The Endpoint Picker performs the following core functions:
66

77
- Endpoint Selection
88
- The EPP determines the appropriate Pod endpoint for the load balancer (LB) to route requests.
9-
- It selects from the pool of ready Pods designated by the assigned InferencePool.
9+
- It selects from the pool of ready Pods designated by the assigned InferencePool's [Selector](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7e3cd457cdcd01339b65861c8e472cf27e6b6e80/api/v1alpha1/inferencepool_types.go#L53) field.
1010
- Endpoint selection is contingent on the request's ModelName matching an `InferenceModel` that references the `InferencePool`.
1111
- Requests with unmatched ModelName values trigger an error response to the proxy.
12-
- The endpoint selection algorithm is detailed below.
1312
- Traffic Splitting and ModelName Rewriting
1413
- The EPP facilitates controlled rollouts of new adapter versions by implementing traffic splitting between adapters within the same `InferencePool`, as defined by the `InferenceModel`.
15-
- EPP rewrites the model name in the request to the target model name as defined on the `InferenceModel` object.
14+
- EPP rewrites the model name in the request to the [target model name](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7e3cd457cdcd01339b65861c8e472cf27e6b6e80/api/v1alpha1/inferencemodel_types.go#L161) as defined on the `InferenceModel` object.
1615
- Observability
1716
- The EPP generates metrics to enhance observability.
1817
- It reports InferenceModel-level metrics, further broken down by target model.

0 commit comments

Comments
 (0)