-
Given that comment in the ggml_mul_mat definition:
and given this discussion (ggml-org/ggml#563) which explicily transposes the second matrix before the mul_mat, it is clear that ggml consideres the second matrix argument to mul_mat as transposed. I'd like to know where/when the matrices are transposed for llama.cpp ? is this done offline during the pytorch -> gguf conversion ? at run-time when loading the weight data ? Thanks ! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
The |
Beta Was this translation helpful? Give feedback.
Yes, exactly.