You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#1684 changed something about quantization with q4_0, such that the results are not usable with the new Metal code.
Here's results prior to that PR:
git checkout 5220a991a5e92bddad9542267ab445a2c033681c # commit before that PR
make clean && LLAMA_METAL=1 make -j
rm -f models/7B/ggml-model-q4_0.bin
./quantize ./models/7B/ggml-model-f16.bin q4_0
sha256sum models/7B/ggml-model-q4_0.bin
# ec2f2d1f0dfb73b72a4cbac7fa121abbe04c37ab327125a38248f930c0f09ddf
./main -m models/7B/ggml-model-q4_0.bin -p "I believe the meaning of life is" --ignore-eos -n 64 -ngl 1
# works
and with that PR:
git co 99009e72f8072fa552eb02efee436be596c71cdd # that PR
make clean && LLAMA_METAL=1 make -j
rm -f models/7B/ggml-model-q4_0.bin
./quantize ./models/7B/ggml-model-f16.bin q4_0
sha256sum models/7B/ggml-model-q4_0.bin
# 33080357951febf9fc7a48fdc130cfbf17912cac7fe327acae42291e77dcc9d1
./main -m models/7B/ggml-model-q4_0.bin -p "I believe the meaning of life is" --ignore-eos -n 64 -ngl 1
# fails with GGML_ASSERT: ggml-metal.m:502: false && "not implemented"
Note that despite quantizing with q4_0 in both cases, the sha of the result changes after that PR, and the resulting model file can't be used with metal.
If it helps, I added a print statement on the line which asserting (NSLog(@"%i", src0t)) and it says src0t is 14, which I believe corresponds to GGML_TYPE_Q6_K.
#1684 changed something about quantization with
q4_0
, such that the results are not usable with the new Metal code.Here's results prior to that PR:
and with that PR:
Note that despite quantizing with
q4_0
in both cases, the sha of the result changes after that PR, and the resulting model file can't be used with metal.If it helps, I added a print statement on the line which asserting (
NSLog(@"%i", src0t)
) and it sayssrc0t
is 14, which I believe corresponds toGGML_TYPE_Q6_K
.cc @ikawrakow
The text was updated successfully, but these errors were encountered: