Skip to content

Q4_1 inference appears broken for 13B parameters #152

Closed
@blackhole89

Description

@blackhole89

I have been experimenting with q4_1 quantisation (since some preliminary results suggest it shold perform better), and noticed that something about the pipeline for the 13B parameter model is broken (whether it is the quantization itself, or the saving or loading). This results in all inferred tokens coming out as #. Meanwhile, 7B works well.

I know we had a patch a while ago that first made the 13B+ models work for q4_0 - did whatever fixes it made not cover q4_1?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions