You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -18,10 +18,12 @@ The main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quant
18
18
19
19
- Plain C/C++ implementation without dependencies
20
20
- Apple silicon first-class citizen - optimized via ARM NEON and Accelerate framework
21
-
- AVX2 support for x86 architectures
21
+
-AVX, AVX2 and AVX512 support for x86 architectures
22
22
- Mixed F16 / F32 precision
23
-
- 4-bit integer quantization support
23
+
- 4-bit, 5-bit and 8-bit integer quantization support
24
24
- Runs on the CPU
25
+
- OpenBLAS support
26
+
- cuBLAS and CLBlast support
25
27
26
28
The original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).
27
29
Since then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves
0 commit comments