-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Improve configs - SchedulerConfig
#16533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve configs - SchedulerConfig
#16533
Conversation
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Failures are relevant, I'll look into it |
… instantiated Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
|
||
# Maximum number of sequences to be processed in a single iteration. | ||
max_num_seqs: int = 128 | ||
max_num_seqs: int = None # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since None
is resolved inside EngineArgs
or ModelConfig
, I feel that we should not allow it to be None
when initializing SchedulerConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is None
so that the default in EngineArgs
is None.
We have the same problem with the max_num_batched_tokens
and max_model_length
. It can't be optional in the config because maths is done on them in __post_init__
, but they must have None
defaults so that EngineArgs
can have them as optional.
Where would you change behaviour so that the defaults here are not None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a from_optional
method just like the one for SamplingParams
to resolve the None
values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using post init to overwrite None values leads to a bunch of issues with type checking since downstream access of these attributes will unnecessarily need to consider the None case in order to avoid type checking errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can have a from_optional method just like the one for SamplingParams to resolve the None values
Would we then need to change everywhere SchedulingConfig
is instantiated? And then remember to instantiate it that way?
Using post init to overwrite None values leads to a bunch of issues with type checking since downstream access of these attributes will unnecessarily need to consider the None case in order to avoid type checking errors.
I agree, but that's why there in SchedulingConfig
the type is int
but in EngineArgs
the type is Optional[int]
, so that the type cheking is happy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth noting that prior to this PR, this is how we handled max _num_batched_tokens
.
I changed the other two here so that they would no longer have arbitrary defaults that could be inherited by EngineArgs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per offline discussion, this is the cleanest way given how the configs are set up right now. We can address this later
Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Erdal Toprak <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Follows #16332 and #16422
Notable changes:
ParallelConfig.distributed_executor_backend
to useLiteral[...]
instead ofstr
nullable_str
tooptional_str
(which sounds more Pythonic) and add other optional type methodsmulti_step_stream_outputs
wasTrue
inEngineArgs
andFalse
inSchedulerConfig
. I chose the config fromEngineArgs