-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[BugFix] Increase timeout for startup failure test #17642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Increase timeout for startup failure test #17642
Conversation
Signed-off-by: Nick Hill <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@hmellor maybe this is related to the recent config refactor? Another possibility is that the overall import time of vLLM has increased significantly for whatever reason. The code related to retrieving model info has been here for many months already, so that by itself couldn't be the cause. This code exists to get around CUDA re-initialization error because importing the model files may initialize CUDA before vLLM Engine is started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for short mitigation. Maybe we can create an issue for tracking this? Sounds like a good ramp up task?
From this profile we can see that the actual retrieval of the model only takes about 1s (bottom thread) but the overhead of creating the new process for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unblock CI first. We can solve the underlying problem later
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
This test is now failing on main most of the time due to the timeout not being large enough.
I'm still confused why this started failing recently. The problem is that just creating
VllmConfig
takes 10 seconds before we even create theLLM
, which is due to the model class being loaded in a separate process to be able to read some config from it:vllm/vllm/model_executor/models/registry.py
Lines 324 to 330 in 2858830
It would be good if we could find a way to retrieve the info in question (like
is_multimodal_model
) without this overhead - I assume it's adding to startup time in all cases, cumulatively contributing to CI times quite a bit, etc.