-
Notifications
You must be signed in to change notification settings - Fork 85
Refactor: Externalize Scheduler's saturation logic and criticality-based service differentiation #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor: Externalize Scheduler's saturation logic and criticality-based service differentiation #805
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: LukeAVanDrie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @LukeAVanDrie. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
This change should be no-op. @liu-cong, I will leave it up to your discretion whether this needs proper regression testing. |
I split out the addition of the saturation detector subdir into a separate PR to be submitted before this one (#808 ). It is just unused until this PR gets submitted, wiring it up. |
112b943
to
48cc9a0
Compare
a3d9090
to
9d273fa
Compare
9d273fa
to
83486ac
Compare
83486ac
to
4a7de3f
Compare
4a7de3f
to
44a11af
Compare
1081d6a
to
5f348a9
Compare
This commit refactors the request processing pipeline, externalizing saturation detection and criticality-based service differentiation from the Scheduler. These responsibilities are now primarily managed by the RequestControl.Director. This change is a preparatory step for the introduction of a new Flow Controller component, which will eventually absorb these admission control duties. Key changes include: - Introduced `PreDispatch` method to `RequestControl.Director` It utilizes the `SaturationDetector` for admission control of non-critical requests and handles request criticality to determine if saturation checks are bypassed. - The saturation detection logic for dropping non-critical requests is intentionally preserved within the `Director` at this stage. This allows the option to bypass the future Flow Controller component during its maturation, ensuring the existing saturation and sheddable request behavior can be maintained as a fallback. - Updated `main.go` to instantiate the `SaturationDetector`, wiring it into the request handling flow. - Updated `director_test.go` to align with the new component responsibilities, adding additional coverage where necessary. This refactoring leads to a cleaner architecture, making the `Scheduler` a more focused component and centralizing initial admission control logic while paving the way for the future Flow Controller. This is aligned with the direction in `0683-epp-architecture-proposal` and should be nearly no-op in terms of EPP behavior.
5f348a9
to
fd52325
Compare
@@ -207,47 +211,62 @@ func run() error { | |||
} | |||
schedulerConfig := scheduling.NewSchedulerConfig( | |||
[]plugins.PreSchedule{}, | |||
[]plugins.Filter{filter.NewSheddableCapacityFilter()}, | |||
[]plugins.Filter{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liu-cong I can also do this in the next PR when I actually remove this from the scheduler. Right now, I have only removed it from scheduler v2, not the original decision tree filter. If we want to bundle that together in a single PR, I can revert this line for now.
@@ -351,12 +548,9 @@ func TestRandomWeightedDraw(t *testing.T) { | |||
var seedVal int64 = 420 | |||
for _, test := range tests { | |||
t.Run(test.name, func(t *testing.T) { | |||
for range 10000 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was always testing a deterministic seed, so this loop did nothing to verify statistical properties. Removed for now until someone wants to update the tests to actually make assertions for statistical properties on arbitrary seeds.
@@ -414,3 +608,40 @@ func TestGetRandomPod(t *testing.T) { | |||
func pointer(v int32) *int32 { | |||
return &v | |||
} | |||
|
|||
func TestDirector_HandleResponse(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test coverage. We had 0 coverage on this method.
} | ||
|
||
// mockScheduler is a configurable mock for the Scheduler interface. | ||
type mockScheduler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced real scheduler instances with a mock in these tests. Consequently, they are no longer "integration"-like tests. Wanted to call that out in case that is a concern. I think using a mock here is more appropriate though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very complex, do you really need all those fields than just injecting a scheduling result and error?
return reqCtx, errutil.Error{Code: errutil.Internal, Msg: "results must be greater than zero"} | ||
// Currently only get a single result. Will refactor to pluggably implement | ||
// the PostSchedule. | ||
if len(results) == 0 || results[0] == nil || results[0].TargetPod == nil || results[0].TargetPod.GetPod() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with sparse defensive code but in general I would not recommend. We can not afford defensive coding everywhere. The scheduler should be implemented to throw an error if any of this happens.
func (d *Director) PreDispatch(ctx context.Context, reqCtx *handlers.RequestContext, reqCriticality v1alpha2.Criticality) error { | ||
logger := log.FromContext(ctx) | ||
logger.V(logutil.DEBUG).Info("Performing saturation check if request is non-critical.") | ||
if d.saturationDetector == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to avoid these checks and always provide a non-nil detector.
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) { | ||
logger.Info("System saturated, dropping non-critical request") | ||
return errutil.Error{ | ||
Code: errutil.InferencePoolResourceExhausted, | ||
Msg: "system saturated, non-critical request dropped", | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) { | |
logger.Info("System saturated, dropping non-critical request") | |
return errutil.Error{ | |
Code: errutil.InferencePoolResourceExhausted, | |
Msg: "system saturated, non-critical request dropped", | |
} | |
} | |
if reqCriticality == v1alpha2.Critical { | |
return | |
} | |
if d.saturationDetector.IsSaturated(ctx) { | |
logger.Info("System saturated, dropping non-critical request") | |
return errutil.Error{ | |
Code: errutil.InferencePoolResourceExhausted, | |
Msg: "system saturated, non-critical request dropped", | |
} | |
} |
|
||
// Check saturation directly ONLY for non-critical requests. | ||
if reqCriticality != v1alpha2.Critical && d.saturationDetector.IsSaturated(ctx) { | ||
logger.Info("System saturated, dropping non-critical request") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already logged by the caller when the error is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, should I remove all instances of logging in director.go before an error is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally,
- Errors should be handled by the caller (log it, handle it, etc.) so no need to double log.
- Prefer the caller to log instead of in the helper method if applicable. Some DEBUG/TRACE logs in the helpers methods are OK.
ctx := ctrl.SetupSignalHandler() | ||
appDatastore := datastore.NewDatastore(ctx, pmf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why appDatastore and appScheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to avoid collisions with the package names. This is not strictly necessary though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps call them ds
sched
This commit refactors the request processing pipeline, externalizing saturation detection and criticality-based service differentiation from the Scheduler. These responsibilities are now primarily managed by the RequestControl.Director.
This change is a preparatory step for the introduction of a new Flow Controller component, which will eventually absorb these admission control duties.
Diff base is: #808 (split out for easier reviewing)
Related to: #674
Key changes include:
PreDispatch
method toRequestControl.Director
. It utilizes theSaturationDetector
for admission control of non-critical requests and handles request criticality to determine if saturation checks are bypassed.Director
at this stage. This allows the option to bypass the future Flow Controller component during its maturation, ensuring the existing saturation and sheddable request behavior can be maintained as a fallback.main.go
to instantiate theSaturationDetector
, wiring it into the request handling flow.director_test.go
to align with the new component responsibilities, adding additional coverage where necessary.Missing from this PR:
Scheduler
to focus solely on preference-based filtering and pod selection for requests that have already been admitted by theDirector
.SheddableRequestFilter
and the distinct critical/sheddable filter paths from theScheduler
's internal logic so that theScheduler
only applies a single, unified preference filter chain to all incoming requests.I did not include the above in this PR due to high activity in those files. I will send a followup PR to address that. In the meantime, the saturation check happens twice: once in the Director, and then another redundant time in the Scheduler. This is wasted compute, but has no affect on behavior.
This refactoring leads to a cleaner architecture, making the
Scheduler
a more focused component and centralizing initial admission control logic, while paving the way for the future Flow Controller.This is aligned with the direction in
0683-epp-architecture-proposal
and is no-op in terms of EPP behavior.