You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw the assert in quantize_row_q4_1_impl function.
static_assert(QK4_1 == 32, "QK4_1 must be 32");
Out of curiosity, I deleted this assert and modified QK4_1 to 64 in "ggml-common.h" file and quantize a 3B model for test. it's size change from 1.9G to 1.7G.
But when i run this model. it's assert in ggml_compute_forward_soft_max_f32 function. So I'm very curious about why the softmax function calculates the -nan value after I modified the value of QK4 1. Because my intuition tells me that after the dequantization is completed, the accuracy of the output result may decrease, but there should be no problem with the operation.
assert(!isnan(wp[i]));
the bt result as follow
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007ffff704526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007ffff70288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x00007ffff702881b in __assert_fail_base (fmt=0x7ffff76bf93b "%s%s%s:%u: %s%s断言 \"%s\" 失败。\n%n", assertion=assertion@entry=0x7ffff799c31e "!isnan(wp[i])", file=file@entry=0x7ffff799b110 "/home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c", line=line@entry=9118, function=function@entry=0x7ffff799d6c0 <__PRETTY_FUNCTION__.14> "ggml_compute_forward_soft_max_f32") at ./assert/assert.c:94
#6 0x00007ffff703b507 in __assert_fail (assertion=0x7ffff799c31e "!isnan(wp[i])", file=0x7ffff799b110 "/home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c", line=9118, function=0x7ffff799d6c0 <__PRETTY_FUNCTION__.14> "ggml_compute_forward_soft_max_f32") at ./assert/assert.c:103
#7 0x00007ffff7927f1b in ggml_compute_forward_soft_max_f32 (params=0x7fffffffb080, dst=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:9118
#8 0x00007ffff7928142 in ggml_compute_forward_soft_max (params=0x7fffffffb080, dst=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:9149
#9 0x00007ffff7938b69 in ggml_compute_forward (params=0x7fffffffb080, tensor=0x555555f53870) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:13082
#10 0x00007ffff793a729 in ggml_graph_compute_thread (data=0x5555563bf880) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:14046
#11 0x00007ffff793accc in ggml_graph_compute (cgraph=0x555555949fd8, cplan=0x7fffffffb360) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.c:14325
#12 0x00007ffff793bcd3 in ggml_backend_cpu_graph_compute (backend=0x555555947d20, cgraph=0x555555949fd8) at /home/zzj/gitRepos/llama/ggml/src/ggml-cpu/ggml-cpu.cpp:168
#13 0x00007ffff7821019 in ggml_backend_graph_compute_async (backend=0x555555947d20, cgraph=0x555555949fd8) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:332
#14 0x00007ffff7825151 in ggml_backend_sched_compute_splits (sched=0x55555594b0f0) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:1397
#15 0x00007ffff7825dd4 in ggml_backend_sched_graph_compute_async (sched=0x55555594b0f0, graph=0x555555f10ca0) at /home/zzj/gitRepos/llama/ggml/src/ggml-backend.cpp:1588
#16 0x00007ffff7c69704 in llama_graph_compute (lctx=..., gf=0x555555f10ca0, n_threads=1, threadpool=0x555555866300) at /home/zzj/gitRepos/llama/src/llama.cpp:8436
#17 0x00007ffff7c6a4a8 in llama_decode_impl (lctx=..., inp_batch=...) at /home/zzj/gitRepos/llama/src/llama.cpp:8675
#18 0x00007ffff7c6f707 in llama_decode (ctx=0x555555950460, batch=...) at /home/zzj/gitRepos/llama/src/llama.cpp:9993
#19 0x00005555555bda3e in main (argc=7, argv=0x7fffffffd818) at /home/zzj/gitRepos/llama/examples/main/main.cpp:642
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I saw the assert in quantize_row_q4_1_impl function.
Out of curiosity, I deleted this assert and modified QK4_1 to 64 in "ggml-common.h" file and quantize a 3B model for test. it's size change from 1.9G to 1.7G.
But when i run this model. it's assert in ggml_compute_forward_soft_max_f32 function. So I'm very curious about why the softmax function calculates the -nan value after I modified the value of QK4 1. Because my intuition tells me that after the dequantization is completed, the accuracy of the output result may decrease, but there should be no problem with the operation.
assert(!isnan(wp[i]));
the bt result as follow
Beta Was this translation helpful? Give feedback.
All reactions