ggml : riscv: add 128-bit RVV support #12530

xctan · 2025-03-23T15:37:57Z

This pull request adds vec_dot compatibility for RISC-V V extension (RVV) architectures using 128-bit VLEN configurations. While prior implementations in pull requests #2929 and #3453 established RVV support for systems with 256-bit VLEN and higher, those kernels proved incompatible with 128-bit VLEN architectures. To address this, the update implements dynamic kernel selection through runtime checks, adapts legacy kernels for 128-bit compatibility, and enhances performance of k-quant kernels for 128-bit VLEN. Additionally, the PR incorporates support for the RISC-V Zfhmin extension to accelerate float16 data type conversions.

Some k-quant kernels now use RVV 128b-optimized inline assembly to bypass compiler limitations (riscv64-linux-gnu-gcc v14.2.0), resolving inadequate and excessive register group spills when intrinsics are used. Manual assembly ensures efficient register allocation.

ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q6_K_q8_K

Verification

By running the Q2_K_L quantized model of DeepSeek-R1-Distill-Llama-8B, I've confirmed the RVV accelerated kernels are not introducing substantial numeric errors compared to the scalar implementation (RVV support disabled during compilation).

scalar	rvv128 (this PR)
20.0849 +/- 0.17272	20.0669 +/- 0.17253

Performance

Performance was measured using the same model as above, on a 64-core RISC-V rv64gcv machine with 128-bit VLEN configuration.

model	size	params	backend	threads	test	t/s	note
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	pp512	3.18 ± 0.00	scalar
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	pp512	27.19 ± 0.11	rvv128
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	tg128	2.94 ± 0.00	scalar
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	tg128	11.10 ± 0.03	rvv128

ggml/src/ggml-cpu/ggml-cpu-quants.c

xctan · 2025-03-27T03:14:35Z

Is there any other issue with this PR? @ggerganov

ggerganov

It would be nice to add some sort of CI for this arch in the future. If you have any ideas, let me know.

JocelynPanPan · 2025-03-27T12:31:00Z

Hi there! Which version of the V extension does your server support?

xctan added 2 commits March 21, 2025 22:10

ggml : add 128-bit RVV support

4fac0d0

ggml : revert to old RVV 256+ q2_K, q3_K, q4_K, q6_K impl

de8387e

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 23, 2025

remove trailing whitespaces

0b43956

ggerganov reviewed Mar 25, 2025

View reviewed changes

ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated Show resolved Hide resolved

restructure vector length selection code

d1cac3d

ggerganov approved these changes Mar 27, 2025

View reviewed changes

ggerganov merged commit 24feaec into ggml-org:master Mar 27, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : riscv: add 128-bit RVV support #12530

ggml : riscv: add 128-bit RVV support #12530

xctan commented Mar 23, 2025 •

edited

Loading

xctan commented Mar 27, 2025

ggerganov left a comment

JocelynPanPan commented Mar 27, 2025

ggml : riscv: add 128-bit RVV support #12530

ggml : riscv: add 128-bit RVV support #12530

Conversation

xctan commented Mar 23, 2025 • edited Loading

Verification

Performance

xctan commented Mar 27, 2025

ggerganov left a comment

Choose a reason for hiding this comment

JocelynPanPan commented Mar 27, 2025

xctan commented Mar 23, 2025 •

edited

Loading