Skip to content

Introduce SaturationDetector component #808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented May 9, 2025

This commit adds a new SaturationDetector component responsible for determining if backend model servers are saturated. It bases its decision on observed metrics like queue depth and KV cache utilization, using configurable thresholds.

To be submitted before #805 (split this change out for easier review).
Related to: #674

It's initial implementation uses identical thresholds to the HasCapacity filter embedded in the default Scheduler filter chain for sheddable requests.

The detector is designed to be a self-contained unit that can be leveraged by other components for admission control and capacity assessment. This is the first step in a larger refactoring to externalize and centralize saturation detection logic.

Copy link

netlify bot commented May 9, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 33342b6
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/68227c3c9eba680008c23f9d
😎 Deploy Preview https://deploy-preview-808--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 9, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector-impl branch from f10f62f to 37e73c4 Compare May 9, 2025 03:26
@ahg-g
Copy link
Contributor

ahg-g commented May 9, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 9, 2025
Copy link
Collaborator

@kfswain kfswain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Just some smaller comments, thanks!

@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector-impl branch 2 times, most recently from be326b9 to 5217359 Compare May 12, 2025 20:15
@kfswain
Copy link
Collaborator

kfswain commented May 12, 2025

/retest

@kfswain
Copy link
Collaborator

kfswain commented May 12, 2025

/approve

Will wait for tests/ response to comments before LGTM. Thanks Luke!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 12, 2025
@LukeAVanDrie
Copy link
Contributor Author

Will wait for tests/ response to comments before LGTM. Thanks Luke!

Test failure is from an unrelated flake. I sent out a fix in #824

@kfswain
Copy link
Collaborator

kfswain commented May 12, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2025
This commit adds a new `SaturationDetector` component responsible for
determining if backend model servers are saturated. It bases its
decision on observed metrics like queue depth and KV cache utilization,
using configurable thresholds.

The detector is designed to be a self-contained unit that can be
leveraged by other components for admission control and capacity
assessment. This is the first step in a larger refactoring to
externalize and centralize saturation detection logic.
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector-impl branch from 5217359 to 33342b6 Compare May 12, 2025 22:54
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 12, 2025
@LukeAVanDrie
Copy link
Contributor Author

New changes are detected. LGTM label has been removed.

I cannot keep up with @nirrozenbaum :)

Rebased on to the MetricsState change which should fix the failing tests.

@kfswain
Copy link
Collaborator

kfswain commented May 13, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3d99aa1 into kubernetes-sigs:main May 13, 2025
8 checks passed
@LukeAVanDrie LukeAVanDrie deleted the saturation-detector-impl branch May 13, 2025 01:08
nayihz pushed a commit to nayihz/gateway-api-inference-extension that referenced this pull request May 14, 2025
This commit adds a new `SaturationDetector` component responsible for
determining if backend model servers are saturated. It bases its
decision on observed metrics like queue depth and KV cache utilization,
using configurable thresholds.

The detector is designed to be a self-contained unit that can be
leveraged by other components for admission control and capacity
assessment. This is the first step in a larger refactoring to
externalize and centralize saturation detection logic.
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
This commit adds a new `SaturationDetector` component responsible for
determining if backend model servers are saturated. It bases its
decision on observed metrics like queue depth and KV cache utilization,
using configurable thresholds.

The detector is designed to be a self-contained unit that can be
leveraged by other components for admission control and capacity
assessment. This is the first step in a larger refactoring to
externalize and centralize saturation detection logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants