Skip to content

Refactor: Externalize Scheduler's saturation logic and criticality-based service differentiation #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented May 8, 2025

This commit refactors the request processing pipeline, externalizing saturation detection and criticality-based service differentiation from the Scheduler. These responsibilities are now primarily managed by the RequestControl.Director.

This change is a preparatory step for the introduction of a new Flow Controller component, which will eventually absorb these admission control duties.

Diff base is: #808 (split out for easier reviewing)
Related to: #674

Key changes include:

  • Introduced PreDispatch method to RequestControl.Director. It utilizes the SaturationDetector for admission control of non-critical requests and handles request criticality to determine if saturation checks are bypassed.
  • The saturation detection logic for dropping non-critical requests is intentionally preserved within the Director at this stage. This allows the option to bypass the future Flow Controller component during its maturation, ensuring the existing saturation and sheddable request behavior can be maintained as a fallback.
  • Updated main.go to instantiate the SaturationDetector, wiring it into the request handling flow.
  • Updated director_test.go to align with the new component responsibilities, adding additional coverage where necessary.

Missing from this PR:

  • Simplifying the Scheduler to focus solely on preference-based filtering and pod selection for requests that have already been admitted by the Director.
  • Removing the SheddableRequestFilter and the distinct critical/sheddable filter paths from the Scheduler's internal logic so that the Scheduler only applies a single, unified preference filter chain to all incoming requests.

I did not include the above in this PR due to high activity in those files. I will send a followup PR to address that. In the meantime, the saturation check happens twice: once in the Director, and then another redundant time in the Scheduler. This is wasted compute, but has no affect on behavior.

This refactoring leads to a cleaner architecture, making the Scheduler a more focused component and centralizing initial admission control logic, while paving the way for the future Flow Controller.

This is aligned with the direction in 0683-epp-architecture-proposal and is no-op in terms of EPP behavior.

Copy link

netlify bot commented May 8, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit fd52325
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/682768d9b48ae40008a40ba8
😎 Deploy Preview https://deploy-preview-805--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2025
@k8s-ci-robot k8s-ci-robot requested review from liu-cong and robscott May 8, 2025 20:26
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 8, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 8, 2025
@ahg-g
Copy link
Contributor

ahg-g commented May 8, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2025
@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented May 8, 2025

This change should be no-op. @liu-cong, I will leave it up to your discretion whether this needs proper regression testing.

@LukeAVanDrie
Copy link
Contributor Author

LukeAVanDrie commented May 8, 2025

I split out the addition of the saturation detector subdir into a separate PR to be submitted before this one (#808 ). It is just unused until this PR gets submitted, wiring it up.

@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 112b943 to 48cc9a0 Compare May 8, 2025 20:51
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch 3 times, most recently from a3d9090 to 9d273fa Compare May 9, 2025 02:49
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 9d273fa to 83486ac Compare May 9, 2025 03:26
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 83486ac to 4a7de3f Compare May 13, 2025 02:11
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 13, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 4a7de3f to 44a11af Compare May 16, 2025 00:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 16, 2025
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch 2 times, most recently from 1081d6a to 5f348a9 Compare May 16, 2025 01:25
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2025
This commit refactors the request processing pipeline, externalizing
saturation detection and criticality-based service differentiation
from the Scheduler. These responsibilities are now primarily managed by
the RequestControl.Director.

This change is a preparatory step for the introduction of a new
Flow Controller component, which will eventually absorb these admission
control duties.

Key changes include:

- Introduced `PreDispatch` method to `RequestControl.Director` It
  utilizes the `SaturationDetector` for admission control of
  non-critical requests and handles request criticality to determine if
  saturation checks are bypassed.
- The saturation detection logic for dropping non-critical requests
  is intentionally preserved within the `Director` at this stage.
  This allows the option to bypass the future Flow Controller
  component during its maturation, ensuring the existing saturation
  and sheddable request behavior can be maintained as a fallback.
- Updated `main.go` to instantiate the `SaturationDetector`, wiring it
  into the request handling flow.
- Updated `director_test.go` to align with the new component
  responsibilities, adding additional coverage where necessary.

This refactoring leads to a cleaner architecture, making the `Scheduler`
a more focused component and centralizing initial admission control
logic while paving the way for the future Flow Controller.

This is aligned with the direction in `0683-epp-architecture-proposal`
and should be nearly no-op in terms of EPP behavior.
@LukeAVanDrie LukeAVanDrie force-pushed the saturation-detector branch from 5f348a9 to fd52325 Compare May 16, 2025 16:33
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2025
@@ -207,47 +211,62 @@ func run() error {
}
schedulerConfig := scheduling.NewSchedulerConfig(
[]plugins.PreSchedule{},
[]plugins.Filter{filter.NewSheddableCapacityFilter()},
[]plugins.Filter{},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liu-cong I can also do this in the next PR when I actually remove this from the scheduler. Right now, I have only removed it from scheduler v2, not the original decision tree filter. If we want to bundle that together in a single PR, I can revert this line for now.

@@ -351,12 +548,9 @@ func TestRandomWeightedDraw(t *testing.T) {
var seedVal int64 = 420
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
for range 10000 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always testing a deterministic seed, so this loop did nothing to verify statistical properties. Removed for now until someone wants to update the tests to actually make assertions for statistical properties on arbitrary seeds.

@@ -414,3 +608,40 @@ func TestGetRandomPod(t *testing.T) {
func pointer(v int32) *int32 {
return &v
}

func TestDirector_HandleResponse(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test coverage. We had 0 coverage on this method.

}

// mockScheduler is a configurable mock for the Scheduler interface.
type mockScheduler struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced real scheduler instances with a mock in these tests. Consequently, they are no longer "integration"-like tests. Wanted to call that out in case that is a concern. I think using a mock here is more appropriate though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very complex, do you really need all those fields than just injecting a scheduling result and error?

return reqCtx, errutil.Error{Code: errutil.Internal, Msg: "results must be greater than zero"}
// Currently only get a single result. Will refactor to pluggably implement
// the PostSchedule.
if len(results) == 0 || results[0] == nil || results[0].TargetPod == nil || results[0].TargetPod.GetPod() == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with sparse defensive code but in general I would not recommend. We can not afford defensive coding everywhere. The scheduler should be implemented to throw an error if any of this happens.

func (d *Director) PreDispatch(ctx context.Context, reqCtx *handlers.RequestContext, reqCriticality v1alpha2.Criticality) error {
logger := log.FromContext(ctx)
logger.V(logutil.DEBUG).Info("Performing saturation check if request is non-critical.")
if d.saturationDetector == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to avoid these checks and always provide a non-nil detector.

Comment on lines +180 to +186
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) {
logger.Info("System saturated, dropping non-critical request")
return errutil.Error{
Code: errutil.InferencePoolResourceExhausted,
Msg: "system saturated, non-critical request dropped",
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) {
logger.Info("System saturated, dropping non-critical request")
return errutil.Error{
Code: errutil.InferencePoolResourceExhausted,
Msg: "system saturated, non-critical request dropped",
}
}
if reqCriticality == v1alpha2.Critical {
return
}
if d.saturationDetector.IsSaturated(ctx) {
logger.Info("System saturated, dropping non-critical request")
return errutil.Error{
Code: errutil.InferencePoolResourceExhausted,
Msg: "system saturated, non-critical request dropped",
}
}


// Check saturation directly ONLY for non-critical requests.
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) {
logger.Info("System saturated, dropping non-critical request")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already logged by the caller when the error is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, should I remove all instances of logging in director.go before an error is returned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally,

  1. Errors should be handled by the caller (log it, handle it, etc.) so no need to double log.
  2. Prefer the caller to log instead of in the helper method if applicable. Some DEBUG/TRACE logs in the helpers methods are OK.

ctx := ctrl.SetupSignalHandler()
appDatastore := datastore.NewDatastore(ctx, pmf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why appDatastore and appScheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to avoid collisions with the package names. This is not strictly necessary though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps call them ds sched

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants