Proposal update for the API names and latency objective #91

ahg-g · 2024-12-10T23:55:05Z

This addresses #69 and #68

As per our discussion in the latest community meeting https://docs.google.com/document/d/1frfPE5L1sI3737rdQV04IcDGeOcGJj2ItjMg6z2SRH0/edit?tab=t.0#bookmark=id.wwaaqtvvu5of

… objective in the proposal.

ahg-g · 2024-12-11T00:03:48Z

/hold

just so it doesn't get merged by accident

robscott

Thanks @ahg-g! Mostly LGTM, just a few nits

docs/proposals/002-api-proposal/proposal.md

kfswain · 2024-12-11T06:12:59Z

/approve
/lgtm

liu-cong · 2024-12-11T17:09:19Z

docs/proposals/002-api-proposal/proposal.md

@@ -28,7 +28,7 @@

 ## Summary

-This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner.
+This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner.


Should we still use "LLM Instance Gateway"? And it seems we didn't define "LLM Instance Gateway"

updated to match the new name.

updated throughput the proposal

docs/proposals/002-api-proposal/proposal.md

liu-cong · 2024-12-11T17:12:26Z

docs/proposals/002-api-proposal/proposal.md

 - Enforce any common set of adapters or base models are available on the Pods
 - Manage Deployments of Pods within the Pool
 - Manage Pod lifecycle of pods within the pool 

-Additionally, any Pod that seeks to join a LLMServerPool would need to support a protocol, defined by LLM Instance Gateway, to ensure the Pool has adequate information to intelligently route requests.
+Additionally, any Pod that seeks to join an InferencePool would need to support a protocol, defined by this project, to ensure the Pool has adequate information to intelligently route requests.


Let's link the protocol doc here? https://docs.google.com/document/d/18VRJ2ufZmAwBZ2jArfvGjQGaWtsQtAP6_yF2Xn6zcms/edit?tab=t.0#heading=h.sw2xdf66jh6

We should spec this protocol and have it in this repo. Can you submit a proposal based on the doc?

sg, will do

docs/proposals/002-api-proposal/proposal.md

liu-cong · 2024-12-11T17:29:06Z

docs/proposals/002-api-proposal/proposal.md


-![K8s Gateway with LLMServerPools](./images/gw_w_lsp.svg)
+![K8s Gateway with InferencePools](./images/gw_w_lsp.svg)


Update the diagram

After discussing with @kfswain, we decided to remove them.

docs/proposals/002-api-proposal/proposal.md

liu-cong · 2024-12-11T17:35:20Z

docs/proposals/002-api-proposal/proposal.md

@@ -336,8 +318,8 @@ Our original idea was to define all LLMService config at the Kubernetes Gateway

 - Reasonable defaults (how do we behave in the absence of user-specified values in optional fields)


The rest of the FAQs are pretty confusing or outdated (the SLO part) to me. Perhaps we can delete them, or add some more detail/example to explain them better?

removed the open questions for now

ahg-g

Thanks

ahg-g · 2024-12-11T18:26:20Z

docs/proposals/002-api-proposal/proposal.md

@@ -28,7 +28,7 @@

 ## Summary

-This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner.
+This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner.


updated to match the new name.

docs/proposals/002-api-proposal/proposal.md

ahg-g · 2024-12-11T18:29:45Z

docs/proposals/002-api-proposal/proposal.md

 - Enforce any common set of adapters or base models are available on the Pods
 - Manage Deployments of Pods within the Pool
 - Manage Pod lifecycle of pods within the pool 

-Additionally, any Pod that seeks to join a LLMServerPool would need to support a protocol, defined by LLM Instance Gateway, to ensure the Pool has adequate information to intelligently route requests.
+Additionally, any Pod that seeks to join an InferencePool would need to support a protocol, defined by this project, to ensure the Pool has adequate information to intelligently route requests.


We should spec this protocol and have it in this repo. Can you submit a proposal based on the doc?

ahg-g · 2024-12-11T19:05:48Z

docs/proposals/002-api-proposal/proposal.md


-A LLMService allows the LLM Service Owner to define:
+An InferenceModel allows the Inference Workload Owner to define:
 - Which LoRA adapter(s) to consume 


docs/proposals/002-api-proposal/proposal.md

ahg-g · 2024-12-11T21:00:35Z

docs/proposals/002-api-proposal/proposal.md

@@ -28,7 +28,7 @@

 ## Summary

-This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **LLMServerPool** and **LLMService** (names up for debate). The LLMServerPool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the LLMService defines the serving objectives of a specific model or LoRA adapter, and is owned by the LLM Service Owner.
+This proposal presents 2 new CRD objects to express the needs of the LLM Instance Gateway. **InferencePool** and **InferenceModel**. The InferencePool is the logical grouping of compute, owned by the Inference Platform Admin persona. While the InferenceModel defines the serving objectives of a specific model or LoRA adapter, and is owned by the Inference Workload Owner.


updated throughput the proposal

liu-cong · 2024-12-11T21:39:12Z

Thanks for updating this!

/lgtm
/approve

k8s-ci-robot · 2024-12-11T21:39:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kfswain, liu-cong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g,kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ahg-g · 2024-12-11T21:41:25Z

/label tide/merge-method-squash
/hold cancel

Thanks all for the great feedback

…igs#91) * Update the API names and add criticality parameter instead of latency objective in the proposal. * Addressed comments * Addressing comments round 2

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 10, 2024

k8s-ci-robot requested review from liu-cong and robscott December 10, 2024 23:55

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 10, 2024

Update the API names and add criticality parameter instead of latency…

0c8fdf3

… objective in the proposal.

ahg-g force-pushed the criticality branch from edc1d8e to 0c8fdf3 Compare December 10, 2024 23:57

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2024

ahg-g changed the title ~~Proposal update for the API names latency objective~~ Proposal update for the API names and latency objective Dec 11, 2024

robscott reviewed Dec 11, 2024

View reviewed changes

ahg-g force-pushed the criticality branch from b3dec25 to 5426ce5 Compare December 11, 2024 01:08

Addressed comments

b75f937

ahg-g force-pushed the criticality branch from 5426ce5 to b75f937 Compare December 11, 2024 02:52

k8s-ci-robot assigned kfswain Dec 11, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2024

liu-cong reviewed Dec 11, 2024

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2024

Addressing comments round 2

0aae37b

ahg-g force-pushed the criticality branch from de28b16 to 0aae37b Compare December 11, 2024 20:58

ahg-g commented Dec 11, 2024

View reviewed changes

k8s-ci-robot assigned liu-cong Dec 11, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2024

k8s-ci-robot added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Dec 11, 2024

k8s-ci-robot merged commit 28e8383 into kubernetes-sigs:main Dec 11, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal update for the API names and latency objective #91

Proposal update for the API names and latency objective #91

ahg-g commented Dec 10, 2024

ahg-g commented Dec 11, 2024

robscott left a comment

kfswain commented Dec 11, 2024

liu-cong Dec 11, 2024

ahg-g Dec 11, 2024

ahg-g Dec 11, 2024

liu-cong Dec 11, 2024

ahg-g Dec 11, 2024

liu-cong Dec 11, 2024

liu-cong Dec 11, 2024

ahg-g Dec 11, 2024

liu-cong Dec 11, 2024

ahg-g Dec 11, 2024

ahg-g left a comment

ahg-g Dec 11, 2024

ahg-g Dec 11, 2024

ahg-g Dec 11, 2024

ahg-g Dec 11, 2024

liu-cong commented Dec 11, 2024

k8s-ci-robot commented Dec 11, 2024

ahg-g commented Dec 11, 2024


		![K8s Gateway with LLMServerPools](./images/gw_w_lsp.svg)
		![K8s Gateway with InferencePools](./images/gw_w_lsp.svg)

		@@ -336,8 +318,8 @@ Our original idea was to define all LLMService config at the Kubernetes Gateway

		- Reasonable defaults (how do we behave in the absence of user-specified values in optional fields)

Proposal update for the API names and latency objective #91

Proposal update for the API names and latency objective #91

Conversation

ahg-g commented Dec 10, 2024

ahg-g commented Dec 11, 2024

robscott left a comment

Choose a reason for hiding this comment

kfswain commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liu-cong commented Dec 11, 2024

k8s-ci-robot commented Dec 11, 2024

ahg-g commented Dec 11, 2024