When are matrices transposed for llama.cpp (since ggml_mul_mat expects the second matrix to be transposed) ? #5098

hmarechal · 2024-01-23T16:06:25Z

hmarechal
Jan 23, 2024

Given that comment in the ggml_mul_mat definition:

    // A: k columns, n rows => [ne03, ne02, n, k]
    // B: k columns, m rows  (i.e. we transpose it internally) => [ne03 * x, ne02 * y, m, k]
    // result is n columns, m rows => [ne03 * x, ne02 * y, m, n]
    GGML_API struct ggml_tensor * ggml_mul_mat(
            struct ggml_context * ctx,
            struct ggml_tensor  * a,
            struct ggml_tensor  * b);

and given this discussion (ggml-org/ggml#563) which explicily transposes the second matrix before the mul_mat, it is clear that ggml consideres the second matrix argument to mul_mat as transposed.

I'd like to know where/when the matrices are transposed for llama.cpp ? is this done offline during the pytorch -> gguf conversion ? at run-time when loading the weight data ?

Thanks !

Answered by slaren

Jan 24, 2024

Yes, exactly.

View full answer

slaren · 2024-01-24T02:01:50Z

slaren
Jan 24, 2024
Maintainer

The ggml_mul_mat operation is performed as c^T = a @ b^T. So the b matrix is transposed, and the result is also transposed. The weights usually are the a parameter, so they aren't normally transposed, but if any such transformation is needed, it would be done during the conversion to gguf.

3 replies

hmarechal Jan 24, 2024
Author

Ha, I see, thanks. The b matrix is the activations, and it doesn't need to be transposed (for the next layer) because the result of a @ bT is transposed too, is that how it goes ?

slaren Jan 24, 2024
Maintainer

Yes, exactly.

Answer selected by hmarechal

hmarechal Jan 24, 2024
Author

Thanks a lot !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When are matrices transposed for llama.cpp (since ggml_mul_mat expects the second matrix to be transposed) ? #5098

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

When are matrices transposed for llama.cpp (since ggml_mul_mat expects the second matrix to be transposed) ? #5098

hmarechal Jan 23, 2024

Replies: 1 comment · 3 replies

slaren Jan 24, 2024 Maintainer

hmarechal Jan 24, 2024 Author

slaren Jan 24, 2024 Maintainer

hmarechal Jan 24, 2024 Author

hmarechal
Jan 23, 2024

Replies: 1 comment 3 replies

slaren
Jan 24, 2024
Maintainer

hmarechal Jan 24, 2024
Author

slaren Jan 24, 2024
Maintainer

hmarechal Jan 24, 2024
Author