Releases · ggml-org/llama.cpp

12 Apr 16:17

bc091a4

b5124 Latest

Latest

common : Define cache directory on AIX (#12915)

Assets 26

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-04-12T16:17:13Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-04-12T16:17:22Z
llama-b5124-bin-macos-arm64.zip

24.2 MB 2025-04-12T16:17:33Z
llama-b5124-bin-macos-x64.zip

25.8 MB 2025-04-12T16:17:35Z
llama-b5124-bin-ubuntu-arm64.zip

25.9 MB 2025-04-12T16:17:36Z
llama-b5124-bin-ubuntu-vulkan-x64.zip

34.8 MB 2025-04-12T16:17:37Z
llama-b5124-bin-ubuntu-x64.zip

27.4 MB 2025-04-12T16:17:38Z
llama-b5124-bin-win-avx-x64.zip

19.7 MB 2025-04-12T16:17:40Z
llama-b5124-bin-win-avx2-x64.zip

19.7 MB 2025-04-12T16:17:41Z
llama-b5124-bin-win-avx512-x64.zip

19.7 MB 2025-04-12T16:17:41Z
Source code (zip)

2025-04-12T15:33:39Z
Source code (tar.gz)

2025-04-12T15:33:39Z

12 Apr 09:31

github-actions

b5123

a483757

b5123

vulkan: use aligned loads for flash attention mask (#12853)

Rewrite the stride logic for the mask tensor in the FA shader to force the
stride to be aligned, to allow using more efficient loads.

Assets 26

12 Apr 06:14

github-actions

b5122

e59ea53

b5122

llava: Fix cpu-only clip image encoding sefault (#12907)

* llava: Fix cpu-only clip image encoding

* clip : no smart ptr for ggml_backend_t

* Fix for backend_ptr push_back

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 26

11 Apr 22:22

github-actions

b5121

c94085d

b5121

server : add VSCode's Github Copilot Chat support (#12896)

* server : add VSCode's Github Copilot Chat support

* cont : update handler name

Assets 26

11 Apr 22:08

github-actions

b5120

e8a6263

b5120

rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903)

Assets 26

11 Apr 21:57

github-actions

b5119

b6930eb

b5119

`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 …

Assets 26

11 Apr 21:14

github-actions

b5118

68b08f3

b5118

common : Define cache directory on FreeBSD (#12892)

Assets 26

11 Apr 14:30

github-actions

b5117

578754b

b5117

sycl: Support sycl_ext_oneapi_limited_graph (#12873)

The current usage of the SYCL-Graph extension checks for
the `sycl_ext_oneapi_graph` device aspect. However, it is also
possible to support `sycl_ext_oneapi_limied_graph` devices that
don't support update

Assets 26

11 Apr 13:57

github-actions

b5116

b2034c2

b5116

contrib: support modelscope community (#12664)

* support download from modelscope

* support login

* remove comments

* add arguments

* fix code

* fix win32

* test passed

* fix readme

* revert readme

* change to MODEL_ENDPOINT

* revert tail line

* fix readme

* refactor model endpoint

* remove blank line

* fix header

* fix as comments

* update comment

* update readme

---------

Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>

Assets 25

11 Apr 13:55

github-actions

b5115

06bb53a

b5115

llama-model : add Glm4Model implementation for GLM-4-0414 (#12867)

* GLM-4-0414

* use original one

* Using with tensor map

* fix bug

* change order

* change order

* format with flask8

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5124

b5123

b5122

b5121

b5120

b5119

b5118

b5117

b5116

b5115