Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5124
b5123
vulkan: use aligned loads for flash attention mask (#12853) Rewrite the stride logic for the mask tensor in the FA shader to force the stride to be aligned, to allow using more efficient loads.
b5122
llava: Fix cpu-only clip image encoding sefault (#12907) * llava: Fix cpu-only clip image encoding * clip : no smart ptr for ggml_backend_t * Fix for backend_ptr push_back --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b5121
server : add VSCode's Github Copilot Chat support (#12896) * server : add VSCode's Github Copilot Chat support * cont : update handler name
b5120
rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903)
b5119
`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 …
b5118
common : Define cache directory on FreeBSD (#12892)
b5117
sycl: Support sycl_ext_oneapi_limited_graph (#12873) The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update
b5116
contrib: support modelscope community (#12664) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
b5115
llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8