-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add README.md file to the epp pkg #386
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
pkg/epp/README.md
Outdated
@@ -0,0 +1,25 @@ | |||
# The EndPoint Picker (EPP) | |||
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider dropping 'As it is implemented now'. My thinking is: we can just update this when/if we make an EPP multitenant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, done.
pkg/epp/README.md
Outdated
|
||
- Endpoint Selection | ||
- The EPP determines the appropriate Pod endpoint for the load balancer (LB) to route requests. | ||
- It selects from the pool of ready Pods designated by the assigned InferencePool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It selects from the pool of ready Pods designated by the assigned InferencePool. | |
- It selects from the pool of ready Pods designated by the assigned InferencePool's [Selector](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/7e3cd457cdcd01339b65861c8e472cf27e6b6e80/api/v1alpha1/inferencepool_types.go#L53) field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pkg/epp/README.md
Outdated
- It selects from the pool of ready Pods designated by the assigned InferencePool. | ||
- Endpoint selection is contingent on the request's ModelName matching an `InferenceModel` that references the `InferencePool`. | ||
- Requests with unmatched ModelName values trigger an error response to the proxy. | ||
- The endpoint selection algorithm is detailed below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider removing, the headers I think are prominent enough to draw attention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
- Requests with unmatched ModelName values trigger an error response to the proxy. | ||
- The endpoint selection algorithm is detailed below. | ||
- Traffic Splitting and ModelName Rewriting | ||
- The EPP facilitates controlled rollouts of new adapter versions by implementing traffic splitting between adapters within the same `InferencePool`, as defined by the `InferenceModel`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More nits: Linking to the InfModel's targetmodel
field could be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This all looks great! Left some comments, but these are true nits, feel free to disregard. This is already a major improvement from what we had before. Thanks! /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review, addressed all comments
pkg/epp/README.md
Outdated
@@ -0,0 +1,25 @@ | |||
# The EndPoint Picker (EPP) | |||
This package provides the reference implementation for the Endpoint Picker (EPP). It implements the [extension protocol](../../docs/proposals/003-endpoint-picker-protocol), enabling a proxy or gateway to request endpoint hints from an extension. As it is implemented now, an EPP instance handles a single `InferencePool` (and so for each `InferencePool`, one must create a dedicated EPP deployment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, done.
pkg/epp/README.md
Outdated
|
||
- Endpoint Selection | ||
- The EPP determines the appropriate Pod endpoint for the load balancer (LB) to route requests. | ||
- It selects from the pool of ready Pods designated by the assigned InferencePool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pkg/epp/README.md
Outdated
- It selects from the pool of ready Pods designated by the assigned InferencePool. | ||
- Endpoint selection is contingent on the request's ModelName matching an `InferenceModel` that references the `InferencePool`. | ||
- Requests with unmatched ModelName values trigger an error response to the proxy. | ||
- The endpoint selection algorithm is detailed below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
- Requests with unmatched ModelName values trigger an error response to the proxy. | ||
- The endpoint selection algorithm is detailed below. | ||
- Traffic Splitting and ModelName Rewriting | ||
- The EPP facilitates controlled rollouts of new adapter versions by implementing traffic splitting between adapters within the same `InferencePool`, as defined by the `InferenceModel`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/lgtm Thanks! |
/hold cancel |
* Polish the epp README.md file * Addressed comments
No description provided.