-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Usage]: Failure to Init Qwen2.5-VL-7B-Instruct with inflight bnb quantization #12900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I will try fix this asap |
@jeejeelee A relevant issue here as well. #12902 |
@jlia0 @MotorBottle Could you please verify if #12905 can resolve your issue? |
Is |
Sorry I made a mistake with depolyment (missed the bnb qunat flags). The deployment is still unsuccessful. New console log below:
|
Full precision runs well with the modded code but quant still does not. |
Full precision model, yes. |
Oh, I just remembered, we also need to modify something, see: #12604 (comment) |
Could you specify what else to modify based on the 0.7.2 version other than modifications in #12905? |
#12944 can resolve the remaining issue |
Confirmed working. Appreciated. |
Your current environment
How would you like to use vllm
Hi and I'm trying to launch qwen2.5-vl-7b-instruct in bnb inflight quanization but got error
I was able to run this model at full precision with docker. Below is how I init the full precision one:
When I added
--quantization bitsandbytes --load-format bitsandbytes
into the docker command, the launch of the model in bnb 4bit inflight quantization failed. #12604 have said supporting this model and I wonder if the dtype cause of the error (but my 2080ti Turing GPUs only support float16 instead of bfloat16 here)Below is the full error log:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: