Closed
Description
I have been experimenting with q4_1 quantisation (since some preliminary results suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken (whether it is the quantization itself, or the saving or loading). This results in all inferred tokens coming out as #
. Meanwhile, 7B works well.
I know we had a patch a while ago that first made the 13B+ models work for q4_0 - did whatever fixes it made not cover q4_1?