Does the QK4_1 must be 32? #12999

255-1 · 2025-04-18T03:42:35Z

255-1
Apr 18, 2025

I saw the assert in quantize_row_q4_1_impl function.

static_assert(QK4_1 == 32, "QK4_1 must be 32");

Out of curiosity, I deleted this assert and modified QK4_1 to 64 in "ggml-common.h" file and quantize a 3B model for test. it's size change from 1.9G to 1.7G.
But when i run this model. it's assert in ggml_compute_forward_soft_max_f32 function. So I'm very curious about why the softmax function calculates the -nan value after I modified the value of QK4 1. Because my intuition tells me that after the dequantization is completed, the accuracy of the output result may decrease, but there should be no problem with the operation.

assert(!isnan(wp[i]));

the bt result as follow

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff704526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff70288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff702881b in __assert_fail_base (fmt=0x7ffff76bf93b "%s%s%s:%u: %s%s断言 \"%s\" 失败。\n%n", assertion=assertion@entry=0x7ffff799c31e "!isnan(wp[i])", file=file@entry=0x7ffff799b110 "/home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c", line=line@entry=9118, function=function@entry=0x7ffff799d6c0 <__PRETTY_FUNCTION__.14> "ggml_compute_forward_soft_max_f32") at ./assert/assert.c:94
#6  0x00007ffff703b507 in __assert_fail (assertion=0x7ffff799c31e "!isnan(wp[i])", file=0x7ffff799b110 "/home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c", line=9118, function=0x7ffff799d6c0 <__PRETTY_FUNCTION__.14> "ggml_compute_forward_soft_max_f32") at ./assert/assert.c:103
#7  0x00007ffff7927f1b in ggml_compute_forward_soft_max_f32 (params=0x7fffffffb080, dst=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:9118
#8  0x00007ffff7928142 in ggml_compute_forward_soft_max (params=0x7fffffffb080, dst=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:9149
#9  0x00007ffff7938b69 in ggml_compute_forward (params=0x7fffffffb080, tensor=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:13082
#10 0x00007ffff793a729 in ggml_graph_compute_thread (data=0x5555563bf880) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:14046
#11 0x00007ffff793accc in ggml_graph_compute (cgraph=0x555555949fd8, cplan=0x7fffffffb360) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:14325
#12 0x00007ffff793bcd3 in ggml_backend_cpu_graph_compute (backend=0x555555947d20, cgraph=0x555555949fd8) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.cpp:168
#13 0x00007ffff7821019 in ggml_backend_graph_compute_async (backend=0x555555947d20, cgraph=0x555555949fd8) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:332
#14 0x00007ffff7825151 in ggml_backend_sched_compute_splits (sched=0x55555594b0f0) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:1397
#15 0x00007ffff7825dd4 in ggml_backend_sched_graph_compute_async (sched=0x55555594b0f0, graph=0x555555f10ca0) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:1588
#16 0x00007ffff7c69704 in llama_graph_compute (lctx=..., gf=0x555555f10ca0, n_threads=1, threadpool=0x555555866300) at /home/zzj/gitRepos/llama/src/llama.cpp:8436
#17 0x00007ffff7c6a4a8 in llama_decode_impl (lctx=..., inp_batch=...) at /home/zzj/gitRepos/llama/src/llama.cpp:8675
#18 0x00007ffff7c6f707 in llama_decode (ctx=0x555555950460, batch=...) at /home/zzj/gitRepos/llama/src/llama.cpp:9993
#19 0x00005555555bda3e in main (argc=7, argv=0x7fffffffd818) at /home/zzj/gitRepos/llama/examples/main/main.cpp:642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the QK4_1 must be 32? #12999

{{title}}

Replies: 0 comments

Select a reply

Does the QK4_1 must be 32? #12999

255-1 Apr 18, 2025

Replies: 0 comments

255-1
Apr 18, 2025