Skip to content

issues Search Results · repo:Vahe1994/AQLM language:Python

Filter by

107 results
 (91 ms)

107 results

inVahe1994/AQLM (press backspace or delete to remove)

Why does peak memory decrease? After quantization, during the forward computation, the values are still restored (to fp16) for matrix calculations. So why does peak memory decrease?
  • maoshanwen
  • 2
  • Opened 
    8 days ago
  • #170

Can you please provide the configurations for quantizing Gemma-2B? I only got PPL about 20 when I use the default configurations.
stale
  • shuangyichen
  • 1
  • Opened 
    on Feb 24
  • #168

https://github.com/werruww/run-Qwen2-72B-Instruct-on-16gb-vram/blob/main/succ_Qwen2_72B_AQLM.ipynb
stale
  • werruww
  • 4
  • Opened 
    on Feb 16
  • #166

https://github.com/werruww/run-Qwen2-72B-Instruct-on-16gb-vram/blob/main/suc_Qwen2_72B_Instruct_AQLM_PV_1bit_1x16%20(2).ipynb It worked but the results were disastrous and the answers were very bad and ...
stale
  • werruww
  • 3
  • Opened 
    on Feb 16
  • #165

Hello, I am not sure if you have a CPU function that is equivalent to the GPU kernel Code1x16MatVec. This would help understand the kernel. Thanks.
stale
  • jinz2014
  • 2
  • Opened 
    on Jan 16
  • #164

I am referring to checkpoint: https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-AQLM-PV-2Bit-1x16, which gets 6.99 perplexity, referred to as 2-bit quantization in the PV tuning paper. Llama3-8B has ...
stale
  • usamec
  • 2
  • Opened 
    on Dec 29, 2024
  • #163

@Vahe1994 @galqiwi @BlackSamorez @justheuristic Hello, thank you for the awesome work, and actively engaging in answering the issues. I have two major questions, which is as follows: 1. According to ...
stale
  • jusjinuk
  • 4
  • Opened 
    on Dec 27, 2024
  • #162

Hello! When I running the benchmark file matmul_benchmark.py, It produces the error in the line 105: matmul = CUDA_KERNEL.code1x16_matmat if args.nbits_per_codebook == 16 else CUDA_KERNEL.code2x8_matmat ...
  • KellyGong
  • 4
  • Opened 
    on Dec 25, 2024
  • #161

from transformers import pipeline import os Set environment variable for PyTorch memory management os.environ[ PYTORCH_CUDA_ALLOC_CONF ] = expandable_segments:True messages = [ { role : user , content ...
stale
  • kim90000
  • 5
  • Opened 
    on Dec 21, 2024
  • #160

from transformers import pipeline messages = [ { role : user , content : Who are you? }, ] pipe = pipeline( text-generation , model= ISTA-DASLab/Qwen2-72B-AQLM-PV-1bit-1x16 , trust_remote_code=True, device_map= ...
stale
  • werruww
  • 5
  • Opened 
    on Dec 19, 2024
  • #159
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub