-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Make auto
load format handle bitsandbytes models
#11867
Labels
feature request
New feature or request
Comments
I feel that _verify_quantization has already done automatic detection. |
Nope.
|
Test self.quantization == "bitsandbytes" before auto detection. sad |
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
🚀 The feature, motivation and pitch
Common bitsandbytes models like
unsloth/meta-llama-3.1-8b-bnb-4bit
require the user to pass--load-format bitsandbytes --quantization bitsandbytes
command-line arguments.I could be wrong, but I believe both of these could be auto-detected by vLLM. The default load format
auto
could selectbitsandbytes
if a bitsandbytes model is selected.AFAIK this detection should work:
Similarly the
--quantization bitsandbytes
argument seems redundant since the quantization is specified in the model config, but if the user omits it then this happens:Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: