Skip to content

GPT-Q supproted ? #274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Xingxiangrui opened this issue Jun 27, 2023 · 1 comment
Closed

GPT-Q supproted ? #274

Xingxiangrui opened this issue Jun 27, 2023 · 1 comment

Comments

@Xingxiangrui
Copy link

vLLM is a wonderful work that It can optimize many models such as GPTBigCode model such as Wizardcoder.

GPTQ for 4 bit model can also be used in Wizardcoder.
https://github.com/vllm-project/vllm

Can vLLM support GPT-Q 4-bit format of GPTBigCode ? or do anyone have a plan?Please leave a comment.

@Xingxiangrui
Copy link
Author

#174

Saw in this issue so I close my issue

yukavio pushed a commit to yukavio/vllm that referenced this issue Jul 3, 2024
With a context manager class, the `__exit__` method is not called when
an exception is raised during the context manager’s `__enter__` method.
This PR addresses that by manually calling that method if an exception
is raised.
mht-sharma pushed a commit to mht-sharma/vllm that referenced this issue Dec 9, 2024
billishyahao pushed a commit to billishyahao/vllm that referenced this issue Dec 31, 2024
billishyahao pushed a commit to billishyahao/vllm that referenced this issue Dec 31, 2024
* corrected types for strides in triton FA (vllm-project#274) (vllm-project#276)

Co-authored-by: Aleksandr Malyshev <[email protected]>
(cherry picked from commit 9a46e97)

* fused_moe configs for MI325X

New fused_moe configs for Mixtral-8x7B and Mixtral-8x22B with
TP=1,2,4,8 for both FP8 and FP16 on the recently announced MI325X.

---------

Co-authored-by: Aleksandr Malyshev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant