Draft a revised README.md #374

smarterclayton · 2025-02-19T19:35:19Z

Clarify the point of the project, and use the vernacular of "inference gateway" vs "ai gateway" to more succinctly explain what the distinction is. Move the website up more prominently, and describe in more detail what the immediate requirements are. Create a stub roadmap section.

I intended this to start a discussion, but I'd like to see the readme more immediately answer what the purpose of the project is. I agree the discussion around inference gateway has past history, but I also don't think "gateway-api-inference-extension" is landing when I talk about this with potential users and I want to try some variations.

@ahg-g @danehans @kfswain @robscott

netlify · 2025-02-19T19:35:38Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`fe96ba8`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67b8b4f63c576d00087a7b99
😎 Deploy Preview	https://deploy-preview-374--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

README.md

kfswain · 2025-02-20T05:24:39Z

/lgtm
/hold
Holding for others to review. Unhold at your discretion. Thanks!

ahg-g · 2025-02-20T15:17:36Z

/approve

k8s-ci-robot · 2025-02-20T15:17:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

robscott · 2025-02-20T17:10:39Z

This is a great update, thanks @smarterclayton!

/lgtm

README.md

ahg-g · 2025-02-20T22:57:00Z

README.md

-This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the [proposal](https://github.com/kubernetes-sigs/wg-serving/tree/main/proposals/012-llm-instance-gateway) for more info.
+The inference gateway:
+
+* Improves the tail latency and throughput of LLM completion requests against Kubernetes-hosted model servers using an extensible request scheduling alogrithm that is both kv-cache and request weight and priority aware, avoiding evictions or queueing as load increases


what do we mean by request weight here?

Do we consider token size?

no, we don't; we consider the current kv-cache utilization and queue length at the model servers

README.md

Clarify the point of the project, and use the vernacular of "inference gateway" vs "ai gateway" to more succinctly explain what the distinction is. Move the website up more prominently, and describe in more detail what the immediate requirements are. Create a stub roadmap section. Add a medium complexity architecture SVG to the readme

smarterclayton · 2025-02-21T17:36:38Z

All changes are made, I think this is ok to merge barring any final review.

smarterclayton · 2025-02-21T17:36:47Z

/hold cancel

ahg-g · 2025-02-21T18:00:31Z

/lgtm

k8s-ci-robot requested review from kfswain and robscott February 19, 2025 19:35

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 19, 2025

smarterclayton commented Feb 19, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

ahg-g reviewed Feb 19, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 20, 2025

k8s-ci-robot assigned kfswain Feb 20, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2025

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 20, 2025

k8s-ci-robot assigned robscott Feb 20, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2025

ahg-g reviewed Feb 20, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

smarterclayton force-pushed the patch-3 branch from 7ddd14f to 0953ded Compare February 20, 2025 21:18

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2025

smarterclayton force-pushed the patch-3 branch 4 times, most recently from ba1b2fc to c8c7f1f Compare February 20, 2025 21:26

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 20, 2025

smarterclayton force-pushed the patch-3 branch from c8c7f1f to 7975dc4 Compare February 20, 2025 21:31

kfswain reviewed Feb 20, 2025

View reviewed changes

README.md Show resolved Hide resolved

ahg-g reviewed Feb 20, 2025

View reviewed changes

smarterclayton force-pushed the patch-3 branch from 7975dc4 to fe96ba8 Compare February 21, 2025 17:16

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 21, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2025

k8s-ci-robot assigned ahg-g Feb 21, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 21, 2025

k8s-ci-robot merged commit 7e3cd45 into kubernetes-sigs:main Feb 21, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft a revised README.md #374

Draft a revised README.md #374

smarterclayton commented Feb 19, 2025

netlify bot commented Feb 19, 2025 •

edited

Loading

kfswain commented Feb 20, 2025

ahg-g commented Feb 20, 2025

k8s-ci-robot commented Feb 20, 2025

robscott commented Feb 20, 2025

ahg-g Feb 20, 2025

smarterclayton Feb 21, 2025

ahg-g Feb 21, 2025

smarterclayton commented Feb 21, 2025

smarterclayton commented Feb 21, 2025

ahg-g commented Feb 21, 2025

Draft a revised README.md #374

Draft a revised README.md #374

Conversation

smarterclayton commented Feb 19, 2025

netlify bot commented Feb 19, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

kfswain commented Feb 20, 2025

ahg-g commented Feb 20, 2025

k8s-ci-robot commented Feb 20, 2025

robscott commented Feb 20, 2025

ahg-g Feb 20, 2025

Choose a reason for hiding this comment

smarterclayton Feb 21, 2025

Choose a reason for hiding this comment

ahg-g Feb 21, 2025

Choose a reason for hiding this comment

smarterclayton commented Feb 21, 2025

smarterclayton commented Feb 21, 2025

ahg-g commented Feb 21, 2025

netlify bot commented Feb 19, 2025 •

edited

Loading