-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal update for the API names and latency objective #91
Conversation
… objective in the proposal.
/hold just so it doesn't get merged by accident |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ahg-g! Mostly LGTM, just a few nits
/approve |
@@ -28,7 +28,7 @@ | |||
|
|||
## Summary | |||
|
|||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner. | |||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still use "LLM Instance Gateway"? And it seems we didn't define "LLM Instance Gateway"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to match the new name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated throughput the proposal
- Enforce any common set of adapters or base models are available on the Pods | ||
- Manage Deployments of Pods within the Pool | ||
- Manage Pod lifecycle of pods within the pool | ||
|
||
Additionally, any Pod that seeks to join a LLMServerPool would need to support a protocol, defined by LLM Instance Gateway, to ensure the Pool has adequate information to intelligently route requests. | ||
Additionally, any Pod that seeks to join an InferencePool would need to support a protocol, defined by this project, to ensure the Pool has adequate information to intelligently route requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's link the protocol doc here? https://docs.google.com/document/d/18VRJ2ufZmAwBZ2jArfvGjQGaWtsQtAP6_yF2Xn6zcms/edit?tab=t.0#heading=h.sw2xdf66jh6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should spec this protocol and have it in this repo. Can you submit a proposal based on the doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, will do
|
||
 | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the diagram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @kfswain, we decided to remove them.
@@ -336,8 +318,8 @@ Our original idea was to define all LLMService config at the Kubernetes Gateway | |||
|
|||
- Reasonable defaults (how do we behave in the absence of user-specified values in optional fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the FAQs are pretty confusing or outdated (the SLO part) to me. Perhaps we can delete them, or add some more detail/example to explain them better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the open questions for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@@ -28,7 +28,7 @@ | |||
|
|||
## Summary | |||
|
|||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner. | |||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to match the new name.
- Enforce any common set of adapters or base models are available on the Pods | ||
- Manage Deployments of Pods within the Pool | ||
- Manage Pod lifecycle of pods within the pool | ||
|
||
Additionally, any Pod that seeks to join a LLMServerPool would need to support a protocol, defined by LLM Instance Gateway, to ensure the Pool has adequate information to intelligently route requests. | ||
Additionally, any Pod that seeks to join an InferencePool would need to support a protocol, defined by this project, to ensure the Pool has adequate information to intelligently route requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should spec this protocol and have it in this repo. Can you submit a proposal based on the doc?
|
||
A LLMService allows the LLM Service Owner to define: | ||
An InferenceModel allows the Inference Workload Owner to define: | ||
- Which LoRA adapter(s) to consume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -28,7 +28,7 @@ | |||
|
|||
## Summary | |||
|
|||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner. | |||
This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated throughput the proposal
Thanks for updating this! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, kfswain, liu-cong The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label tide/merge-method-squash Thanks all for the great feedback |
…igs#91) * Update the API names and add criticality parameter instead of latency objective in the proposal. * Addressed comments * Addressing comments round 2
This addresses #69 and #68
As per our discussion in the latest community meeting https://docs.google.com/document/d/1frfPE5L1sI3737rdQV04IcDGeOcGJj2ItjMg6z2SRH0/edit?tab=t.0#bookmark=id.wwaaqtvvu5of