Skip to content

Commit b08f22c

Browse files
authored
Update README.md (#5366)
Add some links to quantization related PRs
1 parent f57fadc commit b08f22c

File tree

1 file changed

+13
-1
lines changed

1 file changed

+13
-1
lines changed

README.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -736,9 +736,21 @@ Several quantization methods are supported. They differ in the resulting model d
736736
| 13B | bits/weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
737737
738738
- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
739-
- recent k-quants improvements
739+
- recent k-quants improvements and new i-quants
740740
- [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
741741
- [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
742+
- [#4773 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4773)
743+
- [#4856 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4856)
744+
- [#4861 - importance matrix](https://github.com/ggerganov/llama.cpp/pull/4861)
745+
- [#4872 - MoE models](https://github.com/ggerganov/llama.cpp/pull/4872)
746+
- [#4897 - 2-bit quantization](https://github.com/ggerganov/llama.cpp/pull/4897)
747+
- [#4930 - imatrix for all k-quants](https://github.com/ggerganov/llama.cpp/pull/4930)
748+
- [#4951 - imatrix on the GPU](https://github.com/ggerganov/llama.cpp/pull/4957)
749+
- [#4969 - imatrix for legacy quants](https://github.com/ggerganov/llama.cpp/pull/4969)
750+
- [#4996 - k-qunats tuning](https://github.com/ggerganov/llama.cpp/pull/4996)
751+
- [#5060 - Q3_K_XS](https://github.com/ggerganov/llama.cpp/pull/5060)
752+
- [#5196 - 3-bit i-quants](https://github.com/ggerganov/llama.cpp/pull/5196)
753+
- [quantization tuning](https://github.com/ggerganov/llama.cpp/pull/5320), [another one](https://github.com/ggerganov/llama.cpp/pull/5334), and [another one](https://github.com/ggerganov/llama.cpp/pull/5361)
742754
743755
### Perplexity (measuring model quality)
744756

0 commit comments

Comments
 (0)