k-quants PR changed quantization of q4_0, making it incompatible with new Metal code #1711

bakkot · 2023-06-06T06:28:15Z

#1684 changed something about quantization with q4_0, such that the results are not usable with the new Metal code.

Here's results prior to that PR:

git checkout 5220a991a5e92bddad9542267ab445a2c033681c # commit before that PR

make clean && LLAMA_METAL=1 make -j

rm -f models/7B/ggml-model-q4_0.bin

./quantize ./models/7B/ggml-model-f16.bin q4_0

sha256sum models/7B/ggml-model-q4_0.bin
# ec2f2d1f0dfb73b72a4cbac7fa121abbe04c37ab327125a38248f930c0f09ddf

./main -m models/7B/ggml-model-q4_0.bin -p "I believe the meaning of life is" --ignore-eos -n 64 -ngl 1
# works

and with that PR:

git co 99009e72f8072fa552eb02efee436be596c71cdd # that PR

make clean && LLAMA_METAL=1 make -j

rm -f models/7B/ggml-model-q4_0.bin

./quantize ./models/7B/ggml-model-f16.bin q4_0

sha256sum models/7B/ggml-model-q4_0.bin
# 33080357951febf9fc7a48fdc130cfbf17912cac7fe327acae42291e77dcc9d1

./main -m models/7B/ggml-model-q4_0.bin -p "I believe the meaning of life is" --ignore-eos -n 64 -ngl 1
# fails with GGML_ASSERT: ggml-metal.m:502: false && "not implemented"

Note that despite quantizing with q4_0 in both cases, the sha of the result changes after that PR, and the resulting model file can't be used with metal.

If it helps, I added a print statement on the line which asserting (NSLog(@"%i", src0t)) and it says src0t is 14, which I believe corresponds to GGML_TYPE_Q6_K.

cc @ikawrakow

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-06-06T06:41:03Z

Disabled the Q6_K quantization for now until it is supported by Metal and OpenCL

bakkot · 2023-06-06T06:49:12Z

Thanks! With the new commit I get the same shasum as on 5220a99, and the result works with Metal again.

bakkot mentioned this issue Jun 6, 2023

HelloGGML_ASSERT: ggml-metal.m:539: false && "not implemented" #1693

Closed

4 tasks

ggerganov added a commit that referenced this issue Jun 6, 2023

llama : temporary disable Q6_K output quantization (#1711)

7a74dee

bakkot closed this as completed Jun 6, 2023

SlyEcho mentioned this issue Jun 18, 2023

Shape Error When Running Inference after Converting OpenLlama 3B to GGML #1709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k-quants PR changed quantization of q4_0, making it incompatible with new Metal code #1711

k-quants PR changed quantization of q4_0, making it incompatible with new Metal code #1711

bakkot commented Jun 6, 2023

ggerganov commented Jun 6, 2023

bakkot commented Jun 6, 2023

k-quants PR changed quantization of q4_0, making it incompatible with new Metal code #1711

k-quants PR changed quantization of q4_0, making it incompatible with new Metal code #1711

Comments

bakkot commented Jun 6, 2023

ggerganov commented Jun 6, 2023

bakkot commented Jun 6, 2023