Q4_1 inference appears broken for 13B parameters

I have been experimenting with q4_1 quantisation (since [some preliminary results](https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and) suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken (whether it is the quantization itself, or the saving or loading). This results in all inferred tokens coming out as `#`. Meanwhile, 7B works well.

I know we had a patch a while ago that first made the 13B+ models work for q4_0 - did whatever fixes it made not cover q4_1?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Q4_1 inference appears broken for 13B parameters #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Q4_1 inference appears broken for 13B parameters #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions