-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
GPT-Q supproted ? #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Saw in this issue so I close my issue |
yukavio
pushed a commit
to yukavio/vllm
that referenced
this issue
Jul 3, 2024
With a context manager class, the `__exit__` method is not called when an exception is raised during the context manager’s `__enter__` method. This PR addresses that by manually calling that method if an exception is raised.
mht-sharma
pushed a commit
to mht-sharma/vllm
that referenced
this issue
Dec 9, 2024
…ject#276) Co-authored-by: Aleksandr Malyshev <[email protected]> (cherry picked from commit 9a46e97)
billishyahao
pushed a commit
to billishyahao/vllm
that referenced
this issue
Dec 31, 2024
Co-authored-by: Aleksandr Malyshev <[email protected]>
billishyahao
pushed a commit
to billishyahao/vllm
that referenced
this issue
Dec 31, 2024
* corrected types for strides in triton FA (vllm-project#274) (vllm-project#276) Co-authored-by: Aleksandr Malyshev <[email protected]> (cherry picked from commit 9a46e97) * fused_moe configs for MI325X New fused_moe configs for Mixtral-8x7B and Mixtral-8x22B with TP=1,2,4,8 for both FP8 and FP16 on the recently announced MI325X. --------- Co-authored-by: Aleksandr Malyshev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
vLLM is a wonderful work that It can optimize many models such as GPTBigCode model such as Wizardcoder.
GPTQ for 4 bit model can also be used in Wizardcoder.
https://github.com/vllm-project/vllm
Can vLLM support GPT-Q 4-bit format of GPTBigCode ? or do anyone have a plan?Please leave a comment.
The text was updated successfully, but these errors were encountered: