issues Search Results · repo:Vahe1994/AQLM language:Python
Filter by
107 results
(91 ms)107 results
inVahe1994/AQLM (press backspace or delete to remove)Why does peak memory decrease? After quantization, during the forward computation, the values are still restored (to
fp16) for matrix calculations. So why does peak memory decrease?
maoshanwen
- 2
- Opened 8 days ago
- #170
Can you please provide the configurations for quantizing Gemma-2B? I only got PPL about 20 when I use the default
configurations.
stale
shuangyichen
- 1
- Opened on Feb 24
- #168
https://github.com/werruww/run-Qwen2-72B-Instruct-on-16gb-vram/blob/main/succ_Qwen2_72B_AQLM.ipynb
stale
werruww
- 4
- Opened on Feb 16
- #166
https://github.com/werruww/run-Qwen2-72B-Instruct-on-16gb-vram/blob/main/suc_Qwen2_72B_Instruct_AQLM_PV_1bit_1x16%20(2).ipynb
It worked but the results were disastrous and the answers were very bad and ...
stale
werruww
- 3
- Opened on Feb 16
- #165
Hello,
I am not sure if you have a CPU function that is equivalent to the GPU kernel Code1x16MatVec. This would help understand
the kernel. Thanks.
stale
jinz2014
- 2
- Opened on Jan 16
- #164
I am referring to checkpoint: https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-AQLM-PV-2Bit-1x16, which gets 6.99
perplexity, referred to as 2-bit quantization in the PV tuning paper.
Llama3-8B has ...
stale
usamec
- 2
- Opened on Dec 29, 2024
- #163
@Vahe1994 @galqiwi @BlackSamorez @justheuristic
Hello, thank you for the awesome work, and actively engaging in answering the issues.
I have two major questions, which is as follows:
1. According to ...
stale
jusjinuk
- 4
- Opened on Dec 27, 2024
- #162
Hello! When I running the benchmark file matmul_benchmark.py, It produces the error in the line 105:
matmul = CUDA_KERNEL.code1x16_matmat if args.nbits_per_codebook == 16 else CUDA_KERNEL.code2x8_matmat ...
KellyGong
- 4
- Opened on Dec 25, 2024
- #161
from transformers import pipeline import os
Set environment variable for PyTorch memory management
os.environ[ PYTORCH_CUDA_ALLOC_CONF ] = expandable_segments:True
messages = [ { role : user , content ...
stale
kim90000
- 5
- Opened on Dec 21, 2024
- #160
from transformers import pipeline
messages = [ { role : user , content : Who are you? }, ] pipe = pipeline( text-generation , model=
ISTA-DASLab/Qwen2-72B-AQLM-PV-1bit-1x16 , trust_remote_code=True, device_map= ...
stale
werruww
- 5
- Opened on Dec 19, 2024
- #159

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.