Skip to content

Commit d1259b7

Browse files
committed
llama : do not quantize expert gating tensors
1 parent 6cfb31f commit d1259b7

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

Diff for: llama.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -8443,6 +8443,9 @@ static void llama_model_quantize_internal(const std::string & fname_inp, const s
84438443
quantize &= params->quantize_output_tensor || name != "output.weight";
84448444
quantize &= !params->only_copy;
84458445

8446+
// do not quantize expert gating tensors
8447+
quantize &= name.find("ffn_gate_inp.weight") == std::string::npos;
8448+
84468449
enum ggml_type new_type;
84478450
void * new_data;
84488451
size_t new_size;

0 commit comments

Comments
 (0)