Squashed commit of the following:

YellowRoseCx · YellowRoseCx · commit 5feee37aaae5 · 2023-06-25T17:19:35.000-05:00
commit f50ef4a1315ff8a30ebcf1bd223eaf8337aa4193
Merge: c1e5c83 8ac05a5
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 25 17:05:52 2023 -0500

    add yr_kcpp committs to slyecho branch

commit 8ac05a53badcd2532a74f4b6c3fa5622faf6f701
Merge: adeb409 0f75ef9
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 25 16:03:26 2023 -0500

    Merge remote-tracking branch 'origin/upstream/SlyEcho/llama.cpp/hipblas' into dev

commit adeb409de263b269337c468499c0471e5637518e
Merge: abed427 d2034ce
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 25 15:57:56 2023 -0500

    Merge branch 'LostRuins:concedo' into dev

commit 0f75ef9c2c1e64d43d4164f9f75a27b2c61594ba
Merge: 88022bb 447ccbe
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 25 15:55:16 2023 -0500

    Merge pull request #23 from ggerganov/master

    6/25/23 Sync w/ Llama.cpp

commit d2034ced7b177c5bafa736327f270c722845a74a
Merge: 8342fe8 66a2555
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 25 17:01:15 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	README.md
    #	build.zig
    #	flake.nix
    #	tests/test-grad0.c
    #	tests/test-sampling.cpp
    #	tests/test-tokenizer-0.cpp

commit abed427b6f370698fe8e8409e7980f238aad03ef
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sat Jun 24 19:16:30 2023 -0500

    reorganize If statements to include proper headers

commit 06c3bf03b92c2e00fc4bcd27f0c34f32c58b19a9
Merge: ea6d320 8342fe8
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sat Jun 24 16:57:20 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 8342fe81b1c2a00aa81d44c9e1ffb7057df3b323
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 24 12:58:49 2023 +0800

    revert the wstring tokenization. coherency was affected

commit 6da38b0d40a6476ccdd56e48143b21d4254b5da1
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 24 12:30:38 2023 +0800

    up ver

commit 0485fa65a2fc3159ea9fb2ad7661a5837038b31d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 24 11:43:42 2023 +0800

    wstring convert for mpt

commit 6d718525c42c9174e7ecf47c50c9fcb5f64c22f9
Merge: 490cf39 f7b0963
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 23:56:31 2023 +0800

    Merge branch 'optimize_quants_upstream' into concedo_experimental

commit f7b096374dad99164c610196c1926d53d3e87831
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 23:56:22 2023 +0800

    fixed string too long CI issue

commit 490cf395f82d7d0582016a51054457e2d6f89769
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 22:51:51 2023 +0800

    better alloc error

commit ece453ed0984541fa1686e12e275864a36087f05
Merge: f39a746 d7b7484
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 22:46:54 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	README.md

commit f39a7460890de883b0d68d45d75d1780984ca76e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 22:45:22 2023 +0800

    bug fixes for openblas

commit 43c2891afabea24b9a8c2de845d12463f844b949
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 19:01:36 2023 +0800

    option to not use scratch

commit d5e4cf7ffea99e66d2cf6c38826c2fdbc1d68c8a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 19:01:15 2023 +0800

    handle ctx manip

commit df9135e3a9a6708bb62e6484d239e2b4ea212ed7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 23 18:41:23 2023 +0800

    fixing memory bugs

commit ea6d3208dcdc0b05e2c164dde8ee0bfc6a02ad09
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Fri Jun 23 01:53:28 2023 -0500

    Update README.md

commit 4d56ad8158595d1e835cb379939dc5526deb39e2
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Thu Jun 22 16:19:43 2023 -0500

    Update README.md

commit 21f930872b6e232679fe02eac9e429367365c6af
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Thu Jun 22 15:42:05 2023 -0500

    kquants_iter for hipblas and add gfx803

commit b6ff89066bbf2de23dab90bc8bbf9f63d8d1e070
Merge: eb094f0 e6ddb15
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Thu Jun 22 12:42:09 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 0eedccaf06aaccd25fa6d4545c3b2223eae7aa16
Merge: da668e6 bbca06e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 22 17:59:58 2023 +0800

    Merge branch 'master' into optimize_quants_upstream

commit eb094f043f9b0b94e7db028ca36e96ce479b0369
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Wed Jun 21 23:59:18 2023 -0500

    lowvram parameter description

commit e6ddb15c3a838044f18636fabc4a6db16e217256
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 22 10:38:27 2023 +0800

    cleanup

commit 88022bbc60f5b6e6cce39ea35e4a88589d798d8e
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Wed Jun 21 17:26:54 2023 -0500

    Create CMakeLists.txt

commit 09481c5eb8ba5376f6f7d0b5db3b2cb96a5515e8
Merge: bbca06e 7a00c95
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Wed Jun 21 17:24:02 2023 -0500

    Merge branch 'ggerganov' of https://github.com/YellowRoseCx/koboldcpp-rocm into ggerganov

commit 3a5dfeb568d543376910180caa9a99b081fef9d4
Merge: 665cc11 b1f00fa
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Wed Jun 21 16:53:03 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit 1b71752a9fe07f36c3fb8222c1e27052f170ff54
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 22 00:43:25 2023 +0800

    Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX

commit b1f00fa9ccdaec045636318fa5548547b07c248c
Author: Ycros &lt;18012+ycros@users.noreply.github.com&gt;
Date:   Thu Jun 22 01:01:46 2023 +1000

    Fix hordeconfig max context setting, and add Makefile flags for cuda F16/KQuants per iter. (#252)

    * Fix hordeconfig maxcontext setting.

    * cuda: Bring DMMV_F16 and KQUANTS_ITER Makefile flags over from llama.

commit dfdd20240c036741c7a0f2d57a5533bb8f81b794
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 21 16:10:31 2023 +0800

    gpt j use scratch buffers

commit 665cc1136b188e7ff5c1aa1359118c999ff6d162
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Wed Jun 21 01:13:19 2023 -0500

    add lowvram parameter

commit 222cbbb141f7ce79884cafb6bcebd860ae27cc04
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Tue Jun 20 19:03:28 2023 -0500

    add additional hipblas conditions for cublas

commit e1f958124ec99525cb58d8c534f9d1789377544e
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Tue Jun 20 16:51:59 2023 -0500

    Add hip def for cuda v2

commit 3bff5c0f0defd9d49b770c5ce107c71e5cba8003
Merge: a7e74b3 266d47a
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Tue Jun 20 13:38:06 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit 266d47a4b9d08d9a97edef048a5c8fb0c2331405
Merge: cce6e67 da668e6
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 22:46:35 2023 +0800

    Merge branch 'optimize_quants_upstream' into concedo_experimental

commit da668e685f6f2782b9a2a23280ec1727f7dfbd62
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 22:45:16 2023 +0800

    fixing address spaces

commit cce6e67f44b946b355ac9c4dc0c4762d491ccdb5
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 22:45:16 2023 +0800

    fixing address spaces

commit 1f1735f5adcb8ad4c8ab9886119cc0c5dca165ff
Merge: 537ff22 6b75fc4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 21:39:35 2023 +0800

    Merge branch 'optimize_quants_upstream' into concedo_experimental

commit 6b75fc48b942b48906199a990fac237a3f1d467f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 21:38:48 2023 +0800

    fixed global const struct types

commit 537ff22ec93753550b8b7b3f771a48e58b5a61e1
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 20:41:42 2023 +0800

    fixed a bug with token timings, updated lite

commit c5ae3f50a73f8f1f5f1b48ed5e9fe77a1cc4baa3
Merge: d754915 a6e8b02
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 18:41:13 2023 +0800

    Merge branch 'optimize_quants_upstream' into concedo_experimental

commit a6e8b0216d950ca558202f08a97a1be978eb9d0a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 18:34:46 2023 +0800

    remove old dot kernels and template

commit 93247a11cd4d1e664a85a1bde21fe0d6000bad6f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 18:30:30 2023 +0800

    ported q2k and q5k speedups

commit 029bed64469c3636beac8d04123637ecc82eb4bd
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 17:57:44 2023 +0800

    ported q3k speedup successfully

commit d7549152699fe615c7b8f5dd01c7dd4ffc2464ae
Merge: b4c532e 8d816d1
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 17:26:39 2023 +0800

    Merge branch 'optimize_quants_upstream' into concedo_experimental

commit b4c532e8626a714a2d3aff8b019847e76267b6ff
Merge: 5e8e99f aacdbd4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 20 17:26:27 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 8d816d19d1f131393339ebc8ef30ea39c712cd1c
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 20 08:41:35 2023 +0200

    Add q6_k fast matmul kernel

commit 34a4917984afe0fc3cc8dddb58d5f1782118d80b
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 20 08:04:16 2023 +0200

    Use preprocessor for QK_K

commit 069cbe530d826b1b19559d1ded5032202032c287
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 20 08:01:40 2023 +0200

    Fix q2_k fast kernel

commit a7e74b39fe5eedf85d955fe5ea5f4c546322a9b0
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Mon Jun 19 22:04:18 2023 -0500

    Update README.md

commit 5e99b3cb72d83f45b3f7904ffb8f242e743a142c
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Mon Jun 19 22:03:42 2023 -0500

    Update Makefile

commit 9190b17432ebdc489ab05b71df6c3b8d5e7f5895
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Mon Jun 19 21:47:10 2023 -0500

    Update README.md

commit 69fd31d18c56e0bd3ac5c6d72754134062389f80
Merge: c94a438 ba4e85a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 23:38:59 2023 +0800

    Merge branch 'master' into optimize_quants_upstream

commit 5e8e99f206b173071cb405749b8736bca4956421
Merge: 51e834c ba4e85a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 23:37:53 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt

commit c94a438328e21c103fec9e2a5ddedd4a6c02d9e0
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 23:01:49 2023 +0800

    xx + ib0

commit 266d436746f3249222029fb5b93bd7e607429b8a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 22:20:19 2023 +0800

    Added broken new q4k quant

commit 51e834c27bc555441d29b1ce0dd95ff24a0fec4d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 22:38:23 2023 +0800

    keep duplicate targets for now

commit cf94340dfcf8697fbc739e1d1952344b98390d17
Merge: 8e2dc19 16b9cd1
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 22:28:38 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile
    #	README.md

commit 8e2dc19dc6ad31e740861ceee5c84ffd4ebb447e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 21:29:06 2023 +0800

    updated tokenizer, added support for scratch buffers for neox and gpt2

commit cb6daa31719e3b354e7df31b66bfabf9f40081c9
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 19 11:51:23 2023 +0800

    updated lite

commit 2780ea292b1e9c6ead274de3afb34337716be08f
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 18 15:48:00 2023 -0500

    Update Makefile

commit 04a3e64807a92c2e105af92f16dd6db2ea024d39
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 18 14:33:39 2023 -0500

    remove extra line

commit cccbca9dea3780e797a3b4972ba211e0c762fdc1
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 18 14:31:17 2023 -0500

    attempt adding ROCM hipblas

commit a44a1d4b90ed11d83d622eb976a945ff26a8974e
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 18 14:31:01 2023 -0500

    attempt adding ROCM hipblas

commit b08818416972f83349bc4d6479bccc55ee31436d
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sun Jun 18 14:30:54 2023 -0500

    attempt adding ROCM hipblas

commit d0d3c4f32b4a1e6485437809d1a6694c2ba8f0d2
Merge: b08b371 e1886cf
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 18 22:53:10 2023 +0800

    Merge remote-tracking branch 'origin/master' into concedo_experimental

    # Conflicts:
    #	README.md

commit b08b371983932e1f528547b25469a2324d81c835
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 18 16:42:32 2023 +0800

    allow hordeconfig to set a max ctx length too.

commit 278427d9a4ef6883d425f2d931dedd9bb059c2c3
Merge: 8775dd9 ce2c7d7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 18 15:29:44 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile
    #	README.md

commit 8775dd99f49d7551e58d15a030d3f89d91741670
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 18 15:24:58 2023 +0800

    various debug logging improvements

commit dc3472eb588724f7714ccb4106b4ba1c11ca5b01
Merge: dbd11dd 0711a5f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 17 23:10:05 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	flake.nix

commit dbd11ddd60b7e97146061bba04a198f229b6e770
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 17 23:08:14 2023 +0800

    up ver

commit 8bc4143e149b80733927e3222895c7642d68abc4
Merge: 9f8e2f8 971fe9f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 17 22:29:38 2023 +0800

    Merge branch 'concedo' into concedo_experimental

commit 9f8e2f8a1804f16fb2724eb786343725645566cb
Merge: 795b355 794db3e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 17 20:02:32 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile
    #	README.md
    #	pocs/vdot/vdot.cpp
    #	scripts/verify-checksum-models.py
    #	tests/test-quantize-fns.cpp
    #	tests/test-quantize-perf.cpp
    #	tests/test-sampling.cpp
    #	tests/test-tokenizer-0.cpp

commit 795b35546b1d026727f58b6b7934ee5d5f5138d3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 17 19:57:09 2023 +0800

    updated lite

commit 971fe9f007aab94ac385373d011ef21f114243c2
Author: YellowRoseCx &lt;80486540+YellowRoseCx@users.noreply.github.com&gt;
Date:   Sat Jun 17 06:54:29 2023 -0500

    add tokens per second output (#246)

    * add tokens per second output

    * Update gpttype_adapter.cpp

    simplify

    ---------

    Co-authored-by: LostRuins &lt;39025047+LostRuins@users.noreply.github.com&gt;

commit 7ef8d740b9a0e92aaacaf32627c78151050586d7
Merge: ae88eec a09f919
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 16 16:37:14 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile

commit ae88eec40b436f95ec708bf7b731b64b6b8d1ebd
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 16 16:27:23 2023 +0800

    updated lite

commit 0971f83bca2266fba477932c2285c8c8600b5bfb
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 15 22:57:14 2023 +0800

    added eos token id handling for starcoder models, as they use a different EOS ID

commit 3649d35cca47ffae1e6955107e567379d0c7363c
Merge: 6a113ee 254a7a7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 15 18:24:31 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 6a113eeec88cb2bcc10ba4e2ba454da7bf1c2eab
Merge: 8ff35ef b1b8dc3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 15 14:47:32 2023 +0800

    Merge branch 'concedo' into concedo_experimental

commit b1b8dc32c9bf6f99a1b547e5e57a9a7886b9c358
Author: Ycros &lt;18012+ycros@users.noreply.github.com&gt;
Date:   Thu Jun 15 16:46:47 2023 +1000

    Fix Makefile for CUBLAS. (#241)

commit 8ff35ef944d508fc1f14b4933ddea1fa8fa3d1f5
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 15 12:13:55 2023 +0800

    updated lite

commit 3ed3e7b7e2b98cbc867bf42a4599ecf11a03422a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 20:03:14 2023 +0800

    reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models

commit f83b66606b924bac9eb409a62ba1108101ef0def
Merge: 443903f ce36167
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:50:24 2023 +0800

    Merge branch 'concedo' into concedo_experimental

commit 443903fa0fd28172749e274637dbfc82b4352be8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:50:13 2023 +0800

    up ver with these minor improvements

commit ce36167976230e04478afb1e132ca16d440317d8
Author: tqcq &lt;99722391+tqcq@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:41:39 2023 +0800

    fix Fix the link on the Mac platform OpenCL method (#227)

    merging this, please let me know if anything breaks.

commit f5247be0d7be1b337eb1f5902e65ceb9c6720831
Merge: 2b4a286 9254920
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:35:43 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	tests/test-grad0.c

commit 2b4a286e5682a40e9a3f0907cd94e0180fa9b1cc
Merge: e426519 0e3cc8e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:34:53 2023 +0800

    Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental

commit e4265198edc7e7e4890730654dbeba18702b4a52
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 14 11:34:40 2023 +0800

    added cublas back into the makefile as some people requested

commit 15de626b3a81c0c91a3cb644e88b0695cf45ef53
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 23:51:10 2023 +0800

    double max nodes again

commit 82cf97ce92d49d0e05dd7134c57e9eee99abd2a5
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 23:38:41 2023 +0800

    hotfix for rwkv

commit 9db2ec068f78db474443e7b12777c6204fffbd86
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 22:29:38 2023 +0800

    cuda build file

commit 6119b8a3d0af0fe54e9fc3059df4f3e98fe34162
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 22:22:04 2023 +0800

    new vocab files

commit 0e3cc8e6f7b705df3db3929783504e266bb72e00
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 13 16:10:25 2023 +0200

    Improve code formatting

commit f1ac03ed37d9b57f7f3ae334504feceee951cc80
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 13 15:21:44 2023 +0200

    Shorten switch statements

commit f345347e5c0b27d7e7d1e152c5bca0d1fb78503a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 20:44:22 2023 +0800

    updated lite

commit 561ce6a1531f34cc3787baa96f1f465c06b0cba9
Merge: 67559a1 2a972f3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 20:27:11 2023 +0800

    Merge remote-tracking branch 'occam/kquant-opencl' into concedo_experimental

commit 67559a15f33d4b4791039afef6ed4c3c884f11c1
Merge: 871009d 74d4cfa
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 20:26:51 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	Makefile

commit 871009dfab054072f7120e0e265f747803056343
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 20:06:19 2023 +0800

    integrated world tokenizer for RWKV

commit 2a972f36499d27a6d040e8c61c008cdbca3ed764
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue Jun 13 08:25:32 2023 +0200

    Fix q3_k

commit fc8c823f34a3fb304d614c9064378f8521392c22
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Mon Jun 12 20:02:56 2023 +0200

    Fix q2_k, improve code

commit 6e20827f933657fedeb63ad99ff069ce4ee814b8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 19:31:09 2023 +0800

    Added OpenCL DMMV kernels

commit f558e4c2978fb555d2adac77837d886337db2e36
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 14:55:21 2023 +0800

    Finish dequant kernels

commit 56151bb875c225679a345f8109277f26e672db84
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 14:20:44 2023 +0800

    Replace uchar with uint8_t

commit a4ee2b89d206cca5bb70f4e1be38e5ada5d617fe
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Mon Jun 12 08:13:05 2023 +0200

    Fix q4_k opencl struct order

commit 1506affd0a96dca301b72230df11fdec6f2ea7e8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 11 22:29:43 2023 +0800

    Added q6_k kernel

commit 44422fd56773a2f66c24e5a8d726dfe4109f5adc
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun Jun 11 12:47:21 2023 +0200

    Set global and local sizes for kernel calls for dequantizing k-quants

commit 9b4186531246cf8716882755b5fcbad3ec635df3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 21:52:32 2023 +0800

    Porting q2_k kernel to OpenCL

commit 9830871d0f5c13516c5d7bcfb6ea9ec1a34652c7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 16:15:13 2023 +0800

    pulled all Occam's fixes and the kquants are all working now

commit 9b6c35b6518b30e32f9ab6e3a5de33abd7b918cf
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 13 16:02:12 2023 +0800

    rwkv speed enhancements (batch processing), fixed a rwkv token processing bug

commit 860fb026df7f565c888b50e8fa757ddeae826a48
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 22:40:45 2023 +0800

    rwkv compile fix (+1 squashed commits)

    Squashed commits:

    [8b0ebb1] upgraded rwkv + added memory overheads + added state_out bufs

commit 120851df53e8a6c35a8c57704df4e1e4298f5245
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 21:57:31 2023 +0800

    prevent gpu offload if kquant is selected with clblast for now

commit 215edf420b19ec9a4f087b374676f7c53e65df5d
Merge: 9c08017 58970a4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 21:53:13 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 9c08017051da8d0d55900f29cdc890bbf39cc4d3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 12 21:47:57 2023 +0800

    this patch is a work in progress implementation for the k-quants. the dequant kernels are working, but the DMMV ones are not.

commit b9a4da3c6f53473bf0a8477aedaa1aee99eb6c2e
Merge: c44b9c3 fa84c4b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 11 23:27:28 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	SHA256SUMS

commit c44b9c3ecf133810984d0a63980f85b1b378e864
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 11 23:18:03 2023 +0800

    added the llama_v2 cuda back (+2 squashed commit)

    Squashed commit:

    [1c97fd4] Revert "fix for cublas"

    This reverts commit 994be9a4db03e61b3e2d594b9d181589e1d13bb9.

    [fce03c3] Revert "fix for cublas"

    This reverts commit 33528f5b1d6513feb9a36423b7e7499f3d393f44.

commit fb67506c1b49a0fdc902454b8b10c8a0c31add26
Merge: 0c9cd39 303f580
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 23:04:48 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	README.md
    #	flake.nix
    #	ggml-metal.m

commit 0c9cd3925905b948f21dbcbf66f3036e57740f7d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 22:12:01 2023 +0800

    lowered streaming tickrate for greater efficiency

commit b9f74db89e1417be171363244aaa6848706266c7
Merge: fa64971 17c10ac
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 21:07:20 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	Makefile

commit fa649718811740a9cfbbe6d075a1fcec01d63bbb
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 21:05:35 2023 +0800

    encoding

commit 66a3f4e4219c55fc2049f96ddccfc4c71a978288
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 19:29:45 2023 +0800

    added support for lora base

commit 375540837e0ca1933c32a57f754fad1b0b11d7ca
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 19:16:29 2023 +0800

    updated lite

commit a68fcfe738dc884906db000aaf3fdd25c0f79591
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 19:03:41 2023 +0800

    only start a new thread when using sse

commit 43f7e40470607220468c05fea4d4dc31d7b6ffd2
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 10 18:13:26 2023 +0800

    added extra endpoints for abort gen and polled streaming

commit 5bd9cef9fac21630dc0ee56467a18804b90b5e7e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 23:22:16 2023 +0800

    merging Proper SSE Token Streaming #220 with end connection fix test

commit b92f9fe3a29e67e5eee86fe6033fb7e34db23ae8
Merge: 507939c 57b0b53
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 20:41:02 2023 +0800

    Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental

commit 507939c135328e15ccb1ec7e1259408197b899bf
Merge: 7887841 ae9663f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 20:20:04 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 788784179a9fffadf4f7674af3f124abb7c29529
Merge: d28ed99 e1ab14c
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 20:19:56 2023 +0800

    Merge branch 'concedo' into concedo_experimental

commit e1ab14c4ab7a1ee30b896c848863b08f006e7680
Author: 12Boti &lt;38918062+12Boti@users.noreply.github.com&gt;
Date:   Fri Jun 9 14:16:03 2023 +0200

    fix format string vulnerability (#223)

commit 57b0b53b5457a96afb0c7596d859e54d166cd42f
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Fri Jun 9 12:39:35 2023 +0200

    fix kobold lite generation

commit c99ab9df33f21234473f5f7653130a5424de36c7
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Fri Jun 9 12:19:08 2023 +0200

    Revert "Squashed commit of the following:"

    This reverts commit 4f665cd63dfd5046cf792d8d220dc8431c1ac650.

commit e6231c30553b0720ffdda04106625e3a56b32ae5
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Fri Jun 9 12:17:55 2023 +0200

    back to http.server, improved implementation

commit d28ed99e5916fb9755edf53f78beca3f02aa0050
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 18:01:55 2023 +0800

    remove unused declarations

commit 4f665cd63dfd5046cf792d8d220dc8431c1ac650
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Fri Jun 9 10:55:07 2023 +0200

    Squashed commit of the following:

    commit b617f2847b5914736ccf65bec22caaf49b39c0a8
    Merge: 73cc5b8 92f44ff
    Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
    Date:   Fri Jun 9 16:10:35 2023 +0800

        Merge branch 'master' into concedo_experimental

    commit 73cc5b88fbed75d540346bfad11cc5c1e0678705
    Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
    Date:   Fri Jun 9 16:09:23 2023 +0800

        added warning message for unsupported K quants

    commit 92f44ff7f778ef1b94028b2ba6d39943b5ca0ada
    Author: AT &lt;manyoso@users.noreply.github.com&gt;
    Date:   Fri Jun 9 04:00:51 2023 -0400

        metal : add GELU implementation (#1770)

        Co-authored-by: Adam Treat &lt;adam@nomic.ai&gt;

    commit 245fc3c37da5ac5963f9f11a9f4f2ac08d96afc6
    Author: Kawrakow &lt;48489457+ikawrakow@users.noreply.github.com&gt;
    Date:   Fri Jun 9 10:39:59 2023 +0300

        metal : faster q4_0 (#1775)

        * metal : 8% faster q4_0

        Avoid copying into local uchar4 anf float4.

        * metal : 17% faster Q4_0

        Use 64 threads in a thread group.

        ---------

        Co-authored-by: Iwan Kawrakow &lt;iwan.kawrakow@gmail.com&gt;

    commit 01dc509038d5288c9139c60005aba63c0565b379
    Merge: 0833845 72ff528
    Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
    Date:   Fri Jun 9 14:53:35 2023 +0800

        Merge branch 'master' into concedo_experimental

    commit 0833845268339719a490269faefe66ac1d2d1dd5
    Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
    Date:   Fri Jun 9 14:38:31 2023 +0800

        merged metal patch directly into the file

    commit 72ff5282bf0388c60821f504c4c8cc2b1f491aa6
    Author: Kawrakow &lt;48489457+ikawrakow@users.noreply.github.com&gt;
    Date:   Thu Jun 8 22:28:21 2023 +0300

        metal : add Q2_K implementation (#1762)

        * metal : add Q2_K implementation

        27.1 ms / token on M2 Max 30-core GPU, so about the
        same speed as Q4_0. Memory throughput is ~156 GB/s.

        The access pattern used in the Q2_K
        CUDA implementation resulted in significantly lower
        performance (~31 ms/token).

        * Fixing merge conflicts

        ---------

        Co-authored-by: Iwan Kawrakow &lt;iwan.kawrakow@gmail.com&gt;

    commit 0bf7cf1b296fc9fca05411b37afdf08a531487d2
    Author: Georgi Gerganov &lt;ggerganov@gmail.com&gt;
    Date:   Thu Jun 8 20:48:14 2023 +0300

        Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)"

        This reverts commit 8432d4d9f716b25133e3ed671d91e21f6f3be867.

    commit 8432d4d9f716b25133e3ed671d91e21f6f3be867
    Author: le.chang &lt;cljs118@126.com&gt;
    Date:   Fri Jun 9 00:47:56 2023 +0800

        ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)

    commit 6fa1613f15c7b92fa1279426dc15eae541d0e7be
    Author: Hyun-joo KIM &lt;bebopkim@gmail.com&gt;
    Date:   Fri Jun 9 01:47:36 2023 +0900

        Metal inference enhancement - put hard-wired relative path of ggml-model.model file using a patch file due to lack of NSBundle environment

    commit 0f291e1f65c1d68201e71ce99c89562a36686b6d
    Author: Kawrakow &lt;48489457+ikawrakow@users.noreply.github.com&gt;
    Date:   Thu Jun 8 19:46:22 2023 +0300

        metal : Q6_K implementation (#1752)

        * Metal implementation for Q4_K

        Very slow for now:
        42 ms / token, Q4_0 runs in 28 ms/token on my
        30-core M2 Max GPU.

        * Optimizing Q4_K on metal

        The first token always takes longer, I guess because
        the metal kernel is being jit-compiled.
        So, using n = 128 to measure time.

        At this point Q4_K takes 29.5 ms / token
        compared to 27.2 ms / token for Q4_0.
        Quite a bit better than the initial attempt,
        but still not good enough.

        * Optimizing q4_K metal dot some more

        For n = 256 it is now 28.1 ms/token compared to
        27 ms/token for q4_0.

        * Fix after merge with master

        * Metal implementation for Q6_K

        Similar to the CUDA implementation.
        No idea if this is the optimum for Metal, but the few
        alternative variants I tried all had a lower performance.

        We get 36.5 ms / token on M2 Max with 30 GPU cores.
        This corresponds to ~200 GB/second throughput.

        * clang-tidy : add config back

        * Much better Q6_K implementation for metal

        28.3 ms / token for 7B. Subtracting ~9 ms that is spent in
        other compute graph operations, we are left with ~19 ms
        for the matrix multiplications. The model is ~5.5 GB,
        so we are getting 1000 / 19 * 5.5 = 290 GB/s!

        ---------

        Co-authored-by: Iwan Kawrakow &lt;iwan.kawrakow@gmail.com&gt;

    commit 7f181600c77efb48a1b2a2e30ff0cd50c294ebea
    Author: Hyun-joo KIM &lt;bebopkim@gmail.com&gt;
    Date:   Fri Jun 9 01:24:22 2023 +0900

        Metal inference enhancement - put hard-wired relative path of ggml-model.model file due to lack of NSBundle environment

    commit 8fc8179919a11738910db07a800f2b176f8adf09
    Author: qingfengfenga &lt;41416092+qingfengfenga@users.noreply.github.com&gt;
    Date:   Thu Jun 8 15:58:53 2023 +0800

        Add llama.cpp docker support for non-latin languages (#1673)

        * Modify Dockerfile default character set to improve compatibility (#1673)

    commit b50b570ed9d699d3d126d72fc02de92926bcd937
    Author: Steven Roussey &lt;sroussey@gmail.com&gt;
    Date:   Thu Jun 8 00:12:28 2023 -0700

        ggml : fix fprintf warnings (#1720)

    commit 53aba3f393f2e02a78ddaba2e934893a8bbf3246
    Author: Georgi Gerganov &lt;ggerganov@gmail.com&gt;
    Date:   Thu Jun 8 10:09:08 2023 +0300

        clang-tidy : restore dot file from accidental deletion

    commit 4161bdc04debb70bf5f275492b4d89fd9330087c
    Author: Kawrakow &lt;48489457+ikawrakow@users.noreply.github.com&gt;
    Date:   Thu Jun 8 10:08:23 2023 +0300

        metal : add Q4_K implementation (#1733)

        * Metal implementation for Q4_K

        Very slow for now:
        42 ms / token, Q4_0 runs in 28 ms/token on my
        30-core M2 Max GPU.

        * Optimizing Q4_K on metal

        The first token always takes longer, I guess because
        the metal kernel is being jit-compiled.
        So, using n = 128 to measure time.

        At this point Q4_K takes 29.5 ms / token
        compared to 27.2 ms / token for Q4_0.
        Quite a bit better than the initial attempt,
        but still not good enough.

        * Optimizing q4_K metal dot some more

        For n = 256 it is now 28.1 ms/token compared to
        27 ms/token for q4_0.

        * Fix after merge with master

        ---------

        Co-authored-by: Iwan Kawrakow &lt;iwan.kawrakow@gmail.com&gt;

    commit 0035858273ebe0694926bf4414d279f3e1cd109d
    Author: johnson442 &lt;56517414+johnson442@users.noreply.github.com&gt;
    Date:   Thu Jun 8 08:02:48 2023 +0100

        k-quants : add missing compile definition to CMakeLists (#1748)

commit b617f2847b5914736ccf65bec22caaf49b39c0a8
Merge: 73cc5b8 92f44ff
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 16:10:35 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 73cc5b88fbed75d540346bfad11cc5c1e0678705
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 16:09:23 2023 +0800

    added warning message for unsupported K quants

commit 01dc509038d5288c9139c60005aba63c0565b379
Merge: 0833845 72ff528
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 14:53:35 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.devops/full.Dockerfile
    #	.devops/main.Dockerfile
    #	CMakeLists.txt

commit 0833845268339719a490269faefe66ac1d2d1dd5
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 9 14:38:31 2023 +0800

    merged metal patch directly into the file

commit 6fa1613f15c7b92fa1279426dc15eae541d0e7be
Author: Hyun-joo KIM &lt;bebopkim@gmail.com&gt;
Date:   Fri Jun 9 01:47:36 2023 +0900

    Metal inference enhancement - put hard-wired relative path of ggml-model.model file using a patch file due to lack of NSBundle environment

commit dee692a63e0801c24f371f49bda83d4f0c1e95a1
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Thu Jun 8 15:56:25 2023 +0200

    compability with basic_api, change api path to /extra

commit b4e9e185d34d476153de8d6389fc65dfffb51fc9
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Thu Jun 8 15:21:00 2023 +0200

    fix legacy streaming

commit 9a8da35ec4a3d37f532a199c3244c0314ea28a61
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Thu Jun 8 06:18:23 2023 +0200

    working streaming. TODO: fix lite

commit 97971291e9e5d05c428d4c0f0cb8f956e36a63c5
Author: SammCheese &lt;sammy@sammcheese.net&gt;
Date:   Wed Jun 7 00:48:00 2023 +0200

    draft: token streaming

commit 7f181600c77efb48a1b2a2e30ff0cd50c294ebea
Author: Hyun-joo KIM &lt;bebopkim@gmail.com&gt;
Date:   Fri Jun 9 01:24:22 2023 +0900

    Metal inference enhancement - put hard-wired relative path of ggml-model.model file due to lack of NSBundle environment

commit a6a0fa338a8fb390c47ca85e11ce54672ceed38b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 8 22:40:53 2023 +0800

    cleanup indentation, fixing cublas build

commit a979e71ddc6712e57736578e6218abacf431995f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 8 16:28:26 2023 +0800

    add obj flags to all output make targets

commit 6635f7efce3389a0b15d3a01cdc85c4e65c8bccc
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 8 00:20:32 2023 +0800

    updated lite

commit 49a6be3d872b6f798e9bb6a469602aa65b0cdf0a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 7 22:29:38 2023 +0800

    add llama metal compile flags as an option

commit 7b0707ff264f1f1e983b972842cb9ddba68e1503
Merge: e78c675 5c64a09
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 7 17:06:56 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile

commit e78c675a6eae84bfcd4b44f5a9918989ce836976
Merge: ed603dc 5b57a5b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed Jun 7 15:23:29 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	README.md
    #	flake.lock
    #	flake.nix
    #	ggml-opencl.cpp

commit ed603dcafc5224de5199f94b01c976c7d07fc87e
Merge: c046db5 2d43387
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 6 23:12:01 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile
    #	README.md
    #	docs/BLIS.md
    #	llama.cpp
    #	tests/test-quantize-fns.cpp

commit c046db51973bc5cc76e3146d30c3ca73340e1bd0
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue Jun 6 22:38:25 2023 +0800

    lite bugfixes, buffer size changes, fixed a topk bug.

commit 2e5edc80e003985411fac566a7938564b9424dd0
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 23:56:24 2023 +0800

    updated lite

commit 79df932d0a798d46b64c5dffc57ec89053522dc3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 22:50:21 2023 +0800

    added dropdown for blasbatch. added capability to build avx clblast but not in default build for now

commit 54dc75ce73913293044ed49914458ed529eee554
Merge: c27f250 f6431de
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 13:31:53 2023 +0800

    Merge branch 'concedo-opencl-dev' into concedo_experimental

commit f6431ded5d28433e158b1621f667871b084e413f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 13:31:37 2023 +0800

    removed flags from the CL pool malloc, apply code tidying suggestions.

commit c27f250b6f22f2f681120db6da50dd4b95b6539d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 13:24:53 2023 +0800

    bigger scratch buffer for 3B llama

commit 927005626907ce3531be885855212238123c3636
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon Jun 5 11:48:04 2023 +0800

    fixed compile error in cmake VS

commit b7fb1aa233e9feb7f211eb1ed33fe421b7efc57d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 22:34:27 2023 +0800

    removed build info in cmake

commit 6f66e4c4a5dd1307451a4d1dd4d66438eba1456f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 22:27:15 2023 +0800

    updated lite

commit 9aa2d8535b7e8a27e5a017769eefcd5b5c7505e6
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 21:47:17 2023 +0800

    hide gpu input box when dropdown not selected, minor memory fix for neox and gptj

commit 1ddbb9acd97954036044a005b79250a5d2d8b3c3
Merge: dd4b5c6 64e3e74
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 18:07:27 2023 +0800

    Merge branch 'concedo-opencl-dev' into concedo_experimental

    # Conflicts:
    #	ggml-opencl.cpp

commit 64e3e74556f27247d444bcde4bc5873b27d75aae
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 18:04:52 2023 +0800

    change max value size_t to use limits

commit 2b700749e5eb0d0e5eab43eb65ddece06bafada0
Merge: 59fe168 dcb2ed4
Author: LostRuins &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 18:00:06 2023 +0800

    Merge branch 'master' into concedo-opencl-dev

commit dd4b5c64b839484f64d127317d287731f46e08ac
Merge: 8891909 dcb2ed4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 17:38:22 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	ggml-opencl.cpp

commit 88919095b50fe1286c1d3fa88c2d61f09382e06f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 12:09:49 2023 +0800

    edit readme

commit c3c05fc33b56d507642a5531471758ff189475e3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 11:57:46 2023 +0800

    further cleanup, refactor renamemode to hordeconfig

commit 2868fac676c5f266528895c0806659627c8dc39e
Merge: 20803c2 d8bd001
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 11:07:07 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.devops/tools.sh
    #	README.md

commit 20803c221ecf408e78dd434be818d636f1e17bab
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 11:05:46 2023 +0800

    cleaning up some old junk

commit b62279cb39d1595f21ed9ebcb5c2bc07d2d3c5f3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun Jun 4 00:41:08 2023 +0800

    buf size for starcoder still not good

commit c1b293d31ae0a2526539cce945c89843d7879baf
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 3 18:37:13 2023 +0800

    fixed MPT ooms

commit 8bd9a3a48b0acb494acffdc81555853d720cc0b0
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 3 17:17:15 2023 +0800

    updated readme, improved simple launcher

commit 6f82e17b7ab02ee555607fa4c366c320c666d4c3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 3 16:14:08 2023 +0800

    added MPT support

commit 9839259b63a2b0f1490ade1c701d7d412998f814
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat Jun 3 00:55:44 2023 +0800

    allow specifying the horde limit as well

commit 96b0e536b7035a96ed13c6b78e03a9c28d11ad14
Merge: 8d0c81e 59fe168
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 2 22:12:14 2023 +0800

    Merge branch 'opencl-dev-concedo' into concedo_experimental

commit 59fe16877d0f40a88834f5f173f8b01fd7d99e4a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 2 22:10:49 2023 +0800

    Clblast fixes + enhancements to save VRAM:

    1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them.
    2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer
    3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.

commit 8d0c81e7ccb93f092f0fd7e7dfc21d83fddae404
Merge: 144d8a8 24239f0
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 2 12:19:59 2023 +0800

    Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental

commit 144d8a831280bbf6edc9e66cbe0eacc101c6ba29
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri Jun 2 12:19:51 2023 +0800

    updated lite

commit 24239f0df7e9f29cddeffd42b9b606dd89ebb819
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Thu Jun 1 18:57:08 2023 +0200

    Improve implementation

commit 37659d2c4e38adebf3811d0997c8a79e9719edff
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 1 22:33:50 2023 +0800

    allow blasbatchsize -1 which disables blas, but keeps benefits like gpu offloads.

commit 49272e3c53455a55c0a96dcaebf08199d17861c6
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 1 20:03:44 2023 +0800

    adjusted defaults

commit 457aaf5badfe1914f02f8024246d7c5e27aa0ade
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Thu Jun 1 07:33:32 2023 +0200

    Reduce code duplication between cuda and opencl branches

commit 234270bd83e4c07b55202b3be5d4bb94e81c6e7e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu Jun 1 00:14:22 2023 +0800

    back to 32 block size, not better

commit 446e42a8c6ac8085359cb3985df23a26b02e027f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 21:40:12 2023 +0800

    change dmmv block size

commit 077ee4e989a37bc3ef5a337351bfe206f8f834da
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 18:00:52 2023 +0800

    Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612)""

    This reverts commit 4afa38e7446f997d7034e239e718be26e524638f.

commit 50c85bea4cb353b25287494a9ea46d82097a94c2
Merge: 32dada5 5e1eecf
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 17:53:14 2023 +0800

    Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental

commit 32dada5e5f5ef96f34bec89c03b38a8bf8403e5c
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 17:52:09 2023 +0800

    updated lite

commit 5e1eecfe12097c98d56f79236a7b1277be09eeaa
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Wed May 31 07:07:47 2023 +0200

    Adapt to #1612 cl_mem malloc changes

commit 49aaf08387254f64f58660772944e69ceae5bbac
Merge: ac6b49e ffb06a3
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Wed May 31 06:58:51 2023 +0200

    Merge remote-tracking branch 'origin/master' into opencl-dev

commit a5a85d68c654b873cb1c93f407ff758a5d7875d4
Merge: 85c9f7d ffb06a3
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 10:51:54 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	llama.cpp

commit 85c9f7df4135f02f086fe73ea184813e43bf861f
Merge: 4afa38e ac6b49e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 10:20:32 2023 +0800

    Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental

commit 4afa38e7446f997d7034e239e718be26e524638f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 31 10:20:23 2023 +0800

    Revert "opencl : no need to allocate cl_mem on heap (#1612)"

    This reverts commit bb051d9723d628414b9e929e5264e23262a2f1b2.

commit ac6b49ed45b959e75f6ec7432fb6a5a2dc88cc4e
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue May 30 18:49:53 2023 +0200

    Reduce queueing overhead for contiguous tensors by using single mul kernel call

commit 56456797f447e6fc32fe3d450c2ce1554c99cce2
Merge: ea336bf 7552ac5
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue May 30 22:15:58 2023 +0800

    Merge branch 'master' into concedo_experimental

commit ea336bfa332301c9f520e08fa7a538cc60dc0c96
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 22:40:27 2023 +0800

    rwkv eos

commit 6b3373cb811435a678493467f81f2ec8e89f7a32
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 22:06:12 2023 +0800

    revert bad fix

commit ef16d09a51db8d4f1d298edecc11775768e19576
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 18:54:15 2023 +0800

    fix for older gcc, updated lite

commit 3a73ebe8d2172ba8a8234d7ecb34397deffabe55
Merge: 254a9ff 0e730dd
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 16:47:32 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.devops/full.Dockerfile
    #	.devops/main.Dockerfile
    #	Makefile

commit 254a9ff12c6d140e225a900f02050f18bed4e87b
Merge: 30ff113 ebc5d06
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 16:26:24 2023 +0800

    Merge commit 'ebc5d0651a1af44a2aecf503c1ceecede1ef99c4' into concedo_experimental

    # Conflicts:
    #	ggml-opencl.cpp

commit 30ff1133f510370229b79e79668a55c5e30c1ff8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 16:01:05 2023 +0800

    allow users to rename models for use in horde

commit 97b39f875c3593a8fe16400d259c0436edcd6737
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 29 15:50:07 2023 +0800

    fixed fstat64 build error on mac

commit 28f1196f65eb27a8f231fb83b38c91fb42a1a11a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 28 19:36:21 2023 +0800

    adjust default rep pen range

commit 7d159bacd7a6ac53c95b431f1d7bd5c4c4774a1d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 28 11:23:20 2023 +0800

    updated kobold lite

commit dcc426e2de9d40909b71f59bd6942ef68db6553d
Merge: 5d9f5b2 0df7d63
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 28 01:08:39 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	CMakeLists.txt
    #	Makefile
    #	README.md

commit 5d9f5b28a6ec9c0e2c5271550ef999dcfb15b209
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 28 00:48:56 2023 +0800

    rwkv integration completed

commit 55e0fbf0247c285f494dbee75a713650fd186d71
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 22:45:28 2023 +0800

    wip integrating new rwkv

commit fe63bfdb0f5e7e14aeeb324d7849e4c529874c50
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 18:13:27 2023 +0800

    Revert "allow 2048 blasbatchsize"

    This reverts commit 94dc5c2324100d13ce3ce0587f146f91cba8241e.

commit 97c5cca4e5f2b24b631e8c08e8e872fb0864fd3e
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sat May 27 12:00:56 2023 +0200

    OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel

commit 94dc5c2324100d13ce3ce0587f146f91cba8241e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 17:47:18 2023 +0800

    allow 2048 blasbatchsize

commit 92a0d77712bd5ce9399b702fd18c9939dec78893
Merge: abfdfb7 bdbda1b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 17:44:14 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile

commit abfdfb702e483c98d7644813c61ff77ee38b49c6
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 17:32:37 2023 +0800

    added top_a sampler

commit ebc5d0651a1af44a2aecf503c1ceecede1ef99c4
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sat May 27 10:03:35 2023 +0200

    Use events instead of clFinish, where possible

commit 01a0f206dfaf47f22f40b3a231563b0677686e1f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 13:35:40 2023 +0800

    added support for starcoder, which is basically gpt2

commit 6d7749c98f9ac23c4ffd42f7421b5ddbc916c2f2
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 12:42:19 2023 +0800

    no difference

commit bd4fe936f53ef39cdcc64413ea8772b89c8442f9
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 27 11:58:39 2023 +0800

    cleanup sampling code

commit 3c8f4042438b4b1c90b82775d8b0525019a5d90d
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Fri May 26 16:40:26 2023 +0800

    integrated token probability viewer in debugmode

commit 8b8f2f4cf50416bb546a0946daca5f424b056a03
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu May 25 14:49:30 2023 +0800

    up ver to 1.25.1

commit e6eeb234f1d48044bbb75915724772f65d98dcb7
Merge: d2da155 ac7876a
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu May 25 10:34:43 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	README.md

commit d2da155661d62154b98d8bd017729b8d3c353e9b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Thu May 25 10:18:12 2023 +0800

    upgraded clblast

commit 37a34deaa099c0517bb9e6caeacd17ed8f419c1b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 23:34:11 2023 +0800

    added a second pyinstaller for my own use that uses a different python version. don't use this.

commit bf482d1786aff3312f8bfaea89da457defd10828
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 22:21:01 2023 +0800

    revert klite newline bug, trying to add win7 support

commit 844f92688a1c84fed4dc7751749b5023e0045988
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 16:48:39 2023 +0800

    subpattern fix

commit d04b3bbe5e663afc51683f4f4e0b01f81cb861a7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 15:04:17 2023 +0800

    disable mmap when failsafe mode selected from GUI

commit b314cbfb6045a1790891770e6526fddd63eb85a4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 11:28:35 2023 +0800

    updated lite to support variable streaming lengths

commit c97e10c50c287f55dd801042f94637321207c569
Merge: abb9ad7 7d87381
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 00:36:30 2023 +0800

    Merge branch 'master' into concedo_experimental

commit abb9ad789c06b1440dd4888aac63c862b4b3f674
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Wed May 24 00:20:43 2023 +0800

    fixed other arch

commit 0c0009e4b405177d854bc9abd3dca2976549b9cb
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue May 23 23:18:52 2023 +0800

    updated lite

commit 355007b0194e19ab0aa85b3157fe824d1d2dee7b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue May 23 21:52:26 2023 +0800

    added sampler seed

commit cd4012c3ed7f9020cf4bae70a7d85f20334aeb33
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue May 23 21:31:42 2023 +0800

    minor fixes to debug logging, fixed a typo, added a new failsafe mode

commit 5bf9784381ebff31c97655cfe0cc4d5c9e47b803
Merge: 7894e85 2e6cd4b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Tue May 23 18:19:16 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile
    #	ggml-opencl.cpp
    #	llama.cpp

commit 7894e85788d43ed23fb0ee897c0aead89819b200
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 21:54:24 2023 +0800

    fixed a bug in previous klite

commit a05da31fe7f4615d7f251792e866c21ffea23f92
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 20:58:54 2023 +0800

    updated embedded lite

commit e20e302e87a76cd884cc9f971ecfc13123206cb3
Merge: b9f06a7 7e4ea5b
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 17:05:34 2023 +0800

    Merge branch 'master' into concedo_experimental

    # Conflicts:
    #	CMakeLists.txt
    #	Makefile

commit b9f06a7670a995f9fea98dd79cec290f45ac5469
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 16:48:55 2023 +0800

    mavx only for windows by default, let them eat march native.

commit 981d5ba866a7250ab9dfd16ebb4a1bf9724004a6
Merge: 169a26d 18e9dd8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 16:16:48 2023 +0800

    Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental

    # Conflicts:
    #	.github/workflows/build.yml
    #	CMakeLists.txt
    #	Makefile
    #	README.md
    #	ggml-opencl.cpp
    #	llama.cpp
    #	otherarch/ggml_v2-opencl-legacy.c

commit 169a26d15fe7d56bdcddfa36f29cdd616ea35284
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 13:53:10 2023 +0800

    removed unused build targets

commit 587308a202024c81171e34e69c1936d490c36e3e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Mon May 22 12:18:42 2023 +0800

    fixed some build errors on linux, changed icon resolution, added more error printing

commit fea84c3cf5698ee6d65c9160b214bec494249c60
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 22:41:33 2023 +0800

    fix for stupid msvc compiler

commit 60e0c678746c44fb94da5924bac240926d6d0f6e
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 21:13:17 2023 +0800

    fix compile errors on cuda

commit 33528f5b1d6513feb9a36423b7e7499f3d393f44
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 21:03:36 2023 +0800

    fix for cublas

commit 994be9a4db03e61b3e2d594b9d181589e1d13bb9
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 21:02:21 2023 +0800

    fix for cublas

commit 24127ebf987dff67d7e1604bf768096f0364c30f
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 17:29:00 2023 +0800

    updated lite, fixed some encoding issues

commit 18e9dd87da905b8dcb722d9b190998aebdfd8847
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 21 08:34:17 2023 +0200

    Explicitely set GEMM type

commit b6b39960c0ddb8d5289defff82a25dd78603f851
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 21 08:17:17 2023 +0200

    Use compile args for preprocessing constants

commit a1657d02330f37ba26c849e1d55ec559dd08f88f
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Fri May 19 21:18:57 2023 +0200

    Add OpenCL compile options

commit e41a7ae40c338419177d2fc165cb2e96545111c8
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Thu May 18 08:05:19 2023 +0200

    Fix convert_row_f16 kernel issue

commit 457eff920e5a687883fb87e9a2a7dc8b29268895
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Thu May 18 07:35:40 2023 +0200

    Deduplicate dequant kernels

commit 42e1a2ba3de3c30f38bb8f72f73ce0f07b2d675a
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue May 16 18:49:49 2023 +0200

    Fix tensor load to device

    Co-authored-by: Johannes Gäßler &lt;johannesg@5d6.de&gt;

commit cda2d488f994da00bd4b069b897a8f62f82a5637
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue May 16 13:05:33 2023 +0200

    Fix error in convert f16 to f32 kernel call

commit 915d0d11689db2d23bc9dc41ccf94fc9f6f4c70a
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Tue May 16 07:42:01 2023 +0200

    Generate dequant_mul_mat kernels from simple templates

commit 1968380373b6398ce76b6669b21ad5971361c33c
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Mon May 15 19:51:23 2023 +0200

    Fix CMakeLists.txt

commit cb588e2aa46d3f17a1cfcd11b28c6cef2a9c1a81
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 14 22:19:54 2023 +0200

    Add remaining dequant_mul_mat functions

commit 8c7a7cea2eac4852e3d3fe0deeeafb688c013feb
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 14 21:26:07 2023 +0200

    Fix dequant_mul_mat kernel

commit 5f610c90bfc12ea66c068608689203f759a932de
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 14 21:14:05 2023 +0200

    Fix bugs in dequant_mul_mat code

commit 17e53dbb7ec0badae9b32244041f08f4060ff8d1
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 14 17:01:46 2023 +0200

    Refactor OpenCL code to work more like the CUDA code, add missing functions

commit a7e3bee4cc5e101721f1aaf160590acd8af9f2c6
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sun May 14 17:00:37 2023 +0200

    Move back to C++ for OpenCL

commit 75e4548821e9f0f0423626c590672326361f8452
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 01:44:47 2023 +0800

    missed out gpt2

commit 2ead735f0871c18f1ca2414728d22fbd2f8b5a6c
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 01:29:20 2023 +0800

    initial integration completed

commit d6123f738a14c274514631a74310dd5eab881e4a
Merge: d418146 ea60007
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 01:27:27 2023 +0800

    Merge commit 'ea600071cb005267e9e8f2629c1e406dd5fde083' into concedo_experimental

    # Conflicts:
    #	examples/quantize/quantize.cpp

commit d4181465358093d3db5d8cb8f7325ad750efc9ae
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 00:53:20 2023 +0800

    fixed a token decoding bug

commit d1824f1e88bff79b54940d12a259fe2516b66e6b
Merge: 5032e0f d2c59b8
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 00:30:06 2023 +0800

    Merge branch 'master' into concedo_experimental

commit 5032e0fd6440bf4cf029db14eaf1d10d3d6f42a4
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sun May 21 00:29:50 2023 +0800

    trying to fix ggjt v3

commit c048bcfec4f2f44305a9447c01561de333b37703
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 16:47:44 2023 +0800

    remove old filever checks (+7 squashed commit)

    Squashed commit:

    [b72627a] new format not working

    [e568870] old ver works

    [7053b77] compile errors fixed, fixing linkers

    [4ae8889] add new ver

    [ff82dfd] file format checks

    [25b8aa8] refactoring type names

    [931063b] still merging

commit 417302b226d57cc1dd506bdab6ff06072a54b270
Merge: bd1aa72 fb638fa
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 16:16:48 2023 +0800

    Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental

    # Conflicts:
    #	ggml-opencl.cpp

commit bd1aa7212c58aa4e639e9b7cc8d22e9b794efb62
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 16:15:06 2023 +0800

    wip2

commit d6f6b71478f559f837311aca5b1baa46335c5fc7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 16:08:54 2023 +0800

    wip

commit a0cfed1e3052d610998fd9cab6ad5ec859de04e7
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 15:58:33 2023 +0800

    still merging in process

commit a8958f6b7612172972b5031f2c81e42df728f2ad
Merge: 4e86a07 2d5db48
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 15:12:31 2023 +0800

    merging, do not use

commit fb638fa817c0ffef1e4f01b53525d9fa98bfa949
Merge: 0291469 2d5db48
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sat May 20 07:55:02 2023 +0200

    Merge remote-tracking branch 'origin/master' into opencl-dev

commit 02914698f0c7083ecc69f344a356bc54ec405e61
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sat May 20 07:45:56 2023 +0200

    Update Q4_0, Q4_1 and Q8_0 to use half instead of float

commit 285f8f990b412435405abc5e03140866eb00f658
Author: 0cc4m &lt;picard12@live.de&gt;
Date:   Sat May 20 07:26:38 2023 +0200

    Explicitely set CLBlast GEMM type

commit 4e86a07e57b6f61b983c1b751875c1263fa8dcc6
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 12:48:28 2023 +0800

    wip cleanup before big merge

commit 010b2753d909de4a97005737f76cf6bc6fca31bc
Merge: 1225fab 6986c78
Author: Concedo &lt;39025047+LostRuins@users.noreply.github.com&gt;
Date:   Sat May 20 11:30:51 2023 +0800

    Merge commit '6986c7835adc13ba3f9d933b95671bb1f3984dc6' into concedo_experimental

    # Conflicts:
    #	README.md

commit 1225fab2ecb342c264f39c03142beb9c4b3e06c0
Author: Concedo &lt;39025047+LostRuins@us…
diff --git a/examples/common.cpp b/examples/common.cpp
@@ -304,7 +304,7 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
                 invalid_param = true;
                 break;
             }
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             params.main_gpu = std::stoi(argv[i]);
 #else
       fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS. It is not possible to set a main GPU.\n");
@@ -314,7 +314,7 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
                 invalid_param = true;
                 break;
             }
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             std::string arg_next = argv[i];
 
             // split string by , and /
@@ -334,7 +334,7 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
       fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS. It is not possible to set a tensor split.\n");
 #endif // GGML_USE_CUBLAS
         } else if (arg == "--low-vram" || arg == "-lv") {
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             params.low_vram = true;
 #else
       fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS. It is not possible to set lower vram usage.\n");
@@ -414,7 +414,7 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
         exit(1);
     }
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
     if (!params.lora_adapter.empty() && params.n_gpu_layers > 0) {
         fprintf(stderr, "%s: error: the simultaneous use of LoRAs and GPU acceleration is not supported", __func__);
         exit(1);
diff --git a/examples/quantize/quantize.cpp b/examples/quantize/quantize.cpp
@@ -1,5 +1,3 @@
-#include "build-info.h"
-
 #include "llama.h"
 
 #include <cstdio>
@@ -224,8 +222,6 @@ int main(int argc, char ** argv) {
         }
     }
 
-    fprintf(stderr, "%s: build = %d (%s)\n", __func__, BUILD_NUMBER, BUILD_COMMIT);
-
     fprintf(stderr, "%s: quantizing '%s' to '%s' as %s", __func__, fname_inp.c_str(), fname_out.c_str(), ftype_str.c_str());
     if (params.nthread > 0) {
         fprintf(stderr, " using %d threads", params.nthread);
diff --git a/examples/server/server.cpp b/examples/server/server.cpp
@@ -565,7 +565,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,
                 invalid_param = true;
                 break;
             }
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             std::string arg_next = argv[i];
 
             // split string by , and /
@@ -588,7 +588,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,
         }
         else if (arg == "--low-vram" || arg == "-lv")
         {
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             params.low_vram = true;
 #else
             fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS. It is not possible to set lower vram usage.\n");
@@ -599,7 +599,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,
                 invalid_param = true;
                 break;
             }
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
             params.main_gpu = std::stoi(argv[i]);
 #else
             LOG_WARNING("llama.cpp was compiled without cuBLAS. It is not possible to set a main GPU.", {});
diff --git a/ggml.c b/ggml.c
@@ -201,9 +201,11 @@ inline static void* ggml_aligned_malloc(size_t size) {
 #endif
 #elif defined(GGML_USE_OPENBLAS)
 #include <cblas.h>
-#elif defined(GGML_USE_CUBLAS) | defined(GGML_USE_HIPBLAS)
+#endif
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
 #include "ggml-cuda.h"
-#elif defined(GGML_USE_CLBLAST)
+#endif
+#if defined(GGML_USE_CLBLAST)
 #include "ggml-opencl.h"
 #endif
 
@@ -4140,7 +4142,7 @@ struct ggml_context * ggml_init(struct ggml_init_params params) {
             GGML_PRINT_DEBUG("%s: g_state initialized in %f ms\n", __func__, (t_end - t_start)/1000.0f);
         }
 
-#if defined(GGML_USE_CUBLAS)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
         ggml_init_cublas();
 #elif defined(GGML_USE_CLBLAST)
         ggml_cl_init();
@@ -15191,7 +15193,7 @@ static void ggml_compute_forward_cross_entropy_loss_back(
 static void ggml_compute_forward(struct ggml_compute_params * params, struct ggml_tensor * tensor) {
     GGML_ASSERT(params);
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
     bool skip_cpu = ggml_cuda_compute_forward(params, tensor);
     if (skip_cpu) {
         return;
@@ -16699,7 +16701,7 @@ void ggml_graph_compute(struct ggml_context * ctx, struct ggml_cgraph * cgraph)
 
                         size_t cur = 0;
 
-#if defined(GGML_USE_CUBLAS)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
                         if (ggml_cuda_can_mul_mat(node->src0, node->src1, node)) {
                             node->n_tasks = 1; // TODO: this actually is doing nothing
                                                 //       the threads are still spinning
@@ -19003,7 +19005,7 @@ int ggml_cpu_has_wasm_simd(void) {
 }
 
 int ggml_cpu_has_blas(void) {
-#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) || defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST)
+#if defined(GGML_USE_ACCELERATE) || defined(GGML_USE_OPENBLAS) || defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS) || defined(GGML_USE_CLBLAST)
     return 1;
 #else
     return 0;
diff --git a/llama-util.h b/llama-util.h
@@ -1,6 +1,6 @@
 // Internal header to be included only by llama.cpp.
 // Contains wrappers around OS interfaces.
-
+#pragma once
 #ifndef LLAMA_UTIL_H
 #define LLAMA_UTIL_H
 
@@ -219,6 +219,7 @@ struct llama_mmap {
             throw std::runtime_error(format("MapViewOfFile failed: %s", llama_format_win_err(error).c_str()));
         }
 
+        #ifndef USE_FAILSAFE
         #if _WIN32_WINNT >= _WIN32_WINNT_WIN8
         if (prefetch) {
             // Advise the kernel to preload the mapped memory
@@ -233,6 +234,9 @@ struct llama_mmap {
         #else
         #pragma message("warning: You are building for pre-Windows 8; prefetch not supported")
         #endif // _WIN32_WINNT >= _WIN32_WINNT_WIN8
+        #else
+        printf("\nPrefetchVirtualMemory skipped in failsafe mode.");
+        #endif
     }
 
     ~llama_mmap() {
@@ -437,7 +441,7 @@ struct llama_buffer {
     llama_buffer& operator=(llama_buffer&&) = delete;
 };
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
 #include "ggml-cuda.h"
 struct llama_ctx_buffer {
     uint8_t * addr = NULL;
diff --git a/llama.cpp b/llama.cpp
@@ -10,7 +10,7 @@
 #include "llama.h"
 
 #include "ggml.h"
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
 #include "ggml-cuda.h"
 #elif defined(GGML_USE_CLBLAST)
 #include "ggml-opencl.h"
@@ -80,7 +80,7 @@ static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0()
         { MODEL_3B,    256ull * MB },
         { MODEL_7B,    512ull * MB },
         { MODEL_13B,   512ull * MB },
-        { MODEL_30B,   512ull * MB },
+        { MODEL_30B,   640ull * MB },
         { MODEL_65B,  1024ull * MB },
     };
     return k_sizes;
@@ -92,7 +92,7 @@ static const std::map<e_model, size_t> & MEM_REQ_SCRATCH1()
         { MODEL_3B,    256ull * MB },
         { MODEL_7B,    512ull * MB },
         { MODEL_13B,   512ull * MB },
-        { MODEL_30B,   512ull * MB },
+        { MODEL_30B,   640ull * MB },
         { MODEL_65B,  1024ull * MB },
     };
     return k_sizes;
@@ -105,7 +105,7 @@ static const std::map<e_model, size_t> & MEM_REQ_KV_SELF()
         { MODEL_3B,    682ull * MB },
         { MODEL_7B,   1026ull * MB },
         { MODEL_13B,  1608ull * MB },
-        { MODEL_30B,  3124ull * MB },
+        { MODEL_30B,  3224ull * MB },
         { MODEL_65B,  5120ull * MB },
     };
     return k_sizes;
@@ -117,9 +117,9 @@ static const std::map<e_model, size_t> & MEM_REQ_EVAL()
 {
     static std::map<e_model, size_t> k_sizes = {
         { MODEL_3B,   512ull * MB },
-        { MODEL_7B,   768ull * MB },
+        { MODEL_7B,   800ull * MB },
         { MODEL_13B, 1024ull * MB },
-        { MODEL_30B, 1280ull * MB },
+        { MODEL_30B, 1380ull * MB },
         { MODEL_65B, 1536ull * MB },
     };
     return k_sizes;
@@ -175,7 +175,7 @@ struct llama_kv_cache {
             ggml_free(ctx);
         }
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
         ggml_cuda_free_data(k);
         ggml_cuda_free_data(v);
 #endif // GGML_USE_CUBLAS
@@ -234,7 +234,7 @@ struct llama_model {
             ggml_free(ctx);
         }
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
         for (size_t i = 0; i < tensors_by_name.size(); ++i) {
             ggml_cuda_free_data(tensors_by_name[i].second);
         }
@@ -800,7 +800,7 @@ struct llama_model_loader {
                         lmlock->grow_to(lock_size);
                     }
                     break;
-#if defined(GGML_USE_CUBLAS)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
                 case GGML_BACKEND_GPU:
                 case GGML_BACKEND_GPU_SPLIT:
                     ggml_cuda_transform_tensor(lt.data, lt.ggml_tensor);
@@ -920,7 +920,7 @@ static bool kv_cache_init(
     ggml_set_name(cache.v, "cache_v");
 
     (void) n_gpu_layers;
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
     if (n_gpu_layers > n_layer + 1) {
         ggml_cuda_assign_buffers_no_scratch(cache.v);
     }
@@ -1106,15 +1106,15 @@ static void llama_model_load_internal(
         if (hparams.ftype != LLAMA_FTYPE_ALL_F32     &&
             hparams.ftype != LLAMA_FTYPE_MOSTLY_F16  &&
             hparams.ftype != LLAMA_FTYPE_MOSTLY_Q8_0) {
-            throw std::runtime_error(format("this format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1405)"));
+            printf("\nthis format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1405)");
         }
     }
 
     if (file_version < LLAMA_FILE_VERSION_GGJT_V3) {
         if (hparams.ftype == LLAMA_FTYPE_MOSTLY_Q4_0 ||
             hparams.ftype == LLAMA_FTYPE_MOSTLY_Q4_1 ||
             hparams.ftype == LLAMA_FTYPE_MOSTLY_Q8_0) {
-            throw std::runtime_error(format("this format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1508)"));
+            printf("\nthis format is no longer supported (see https://github.com/ggerganov/llama.cpp/pull/1508)");
         }
     }
 
@@ -1150,7 +1150,7 @@ static void llama_model_load_internal(
     }
 
     (void) main_gpu;
-#if defined(GGML_USE_CUBLAS)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
     fprintf(stderr, "%s: using CUDA for GPU acceleration\n", __func__);
     ggml_cuda_set_main_device(main_gpu);
 #define LLAMA_BACKEND_OFFLOAD       GGML_BACKEND_GPU
@@ -1261,7 +1261,7 @@ static void llama_model_load_internal(
 
         (void) vram_scratch;
         (void) n_batch;
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
         if (low_vram) {
             fprintf(stderr, "%s: not allocating a VRAM scratch buffer due to low VRAM option\n", __func__);
             ggml_cuda_set_scratch_size(0); // disable scratch
@@ -1274,7 +1274,7 @@ static void llama_model_load_internal(
             }
         }
 #endif // GGML_USE_CUBLAS
-#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS) || defined(GGML_USE_CLBLAST)
         const int n_gpu = std::min(n_gpu_layers, int(hparams.n_layer));
 
         fprintf(stderr, "%s: offloading %d repeating layers to GPU\n", __func__, n_gpu);
@@ -1314,7 +1314,7 @@ static void llama_model_load_internal(
     }
 
     (void) tensor_split;
-#if defined(GGML_USE_CUBLAS)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS)
     {
         ggml_cuda_set_tensor_split(tensor_split);
     }
@@ -1375,11 +1375,11 @@ static bool llama_eval_internal(
             const int    n_threads,
             const char * cgraph_fname) {
 
-    // enforce that the first token is BOS
-    if (n_past == 0 && tokens[0] != llama_token_bos()) {
-        fprintf(stderr, "%s: first token must be BOS\n", __func__);
-        return false;
-    }
+    // // enforce that the first token is BOS
+    // if (n_past == 0 && tokens[0] != llama_token_bos()) {
+    //     fprintf(stderr, "%s: first token must be BOS\n", __func__);
+    //     return false;
+    // }
 
     const int64_t t_start_us = ggml_time_us();
 
@@ -1435,7 +1435,7 @@ static bool llama_eval_internal(
     offload_func_t offload_func_kq = llama_nop;
     offload_func_t offload_func_v  = llama_nop;
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
         if (n_gpu_layers > n_layer) {
             offload_func_nr = ggml_cuda_assign_buffers;
         }
@@ -1450,7 +1450,7 @@ static bool llama_eval_internal(
     for (int il = 0; il < n_layer; ++il) {
         offload_func_t offload_func = llama_nop;
 
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
         if (il >= i_gpu_start) {
             offload_func = ggml_cuda_assign_buffers;
         }
diff --git a/llama.h b/llama.h
@@ -2,7 +2,7 @@
 #define LLAMA_H
 
 #include "ggml.h"
-#ifdef GGML_USE_CUBLAS
+#if defined GGML_USE_CUBLAS || defined GGML_USE_HIPBLAS
 #include "ggml-cuda.h"
 #define LLAMA_MAX_DEVICES GGML_CUDA_MAX_DEVICES
 #else
@@ -46,7 +46,7 @@
 #define LLAMA_SESSION_MAGIC          LLAMA_FILE_MAGIC_GGSN
 #define LLAMA_SESSION_VERSION        1
 
-#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_CLBLAST) || defined(GGML_USE_METAL)
+#if defined(GGML_USE_CUBLAS) || defined(GGML_USE_HIPBLAS) || defined(GGML_USE_CLBLAST) || defined(GGML_USE_METAL)
 // Defined when llama.cpp is compiled with support for offloading model layers to GPU.
 #define LLAMA_SUPPORTS_GPU_OFFLOAD
 #endif

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,3 @@`
`1`		`-#include "build-info.h"`
`2`		`-`
`3`	`1`	`#include "llama.h"`
`4`	`2`
`5`	`3`	`#include <cstdio>`
`@@ -224,8 +222,6 @@ int main(int argc, char ** argv) {`
`224`	`222`	`}`
`225`	`223`	`}`
`226`	`224`
`227`		`- fprintf(stderr, "%s: build = %d (%s)\n", __func__, BUILD_NUMBER, BUILD_COMMIT);`
`228`		`-`
`229`	`225`	`fprintf(stderr, "%s: quantizing '%s' to '%s' as %s", __func__, fname_inp.c_str(), fname_out.c_str(), ftype_str.c_str());`
`230`	`226`	`if (params.nthread > 0) {`
`231`	`227`	`fprintf(stderr, " using %d threads", params.nthread);`
Original file line number	Diff line number	Diff line change
`@@ -565,7 +565,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,`
`565`	`565`	`invalid_param = true;`
`566`	`566`	`break;`
`567`	`567`	`}`
`568`		`-#ifdef GGML_USE_CUBLAS`
	`568`	`+#if defined GGML_USE_CUBLAS \|\| defined GGML_USE_HIPBLAS`
`569`	`569`	`std::string arg_next = argv[i];`
`570`	`570`
`571`	`571`	`// split string by , and /`
`@@ -588,7 +588,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,`
`588`	`588`	`}`
`589`	`589`	`else if (arg == "--low-vram" \|\| arg == "-lv")`
`590`	`590`	`{`
`591`		`-#ifdef GGML_USE_CUBLAS`
	`591`	`+#if defined GGML_USE_CUBLAS \|\| defined GGML_USE_HIPBLAS`
`592`	`592`	`params.low_vram = true;`
`593`	`593`	`#else`
`594`	`594`	`fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS. It is not possible to set lower vram usage.\n");`
`@@ -599,7 +599,7 @@ static void server_params_parse(int argc, char ** argv, server_params & sparams,`
`599`	`599`	`invalid_param = true;`
`600`	`600`	`break;`
`601`	`601`	`}`
`602`		`-#ifdef GGML_USE_CUBLAS`
	`602`	`+#if defined GGML_USE_CUBLAS \|\| defined GGML_USE_HIPBLAS`
`603`	`603`	`params.main_gpu = std::stoi(argv[i]);`
`604`	`604`	`#else`
`605`	`605`	`LOG_WARNING("llama.cpp was compiled without cuBLAS. It is not possible to set a main GPU.", {});`