Skip to content

ggml : riscv: add 128-bit RVV support #12530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 27, 2025
Merged

Conversation

xctan
Copy link
Contributor

@xctan xctan commented Mar 23, 2025

This pull request adds vec_dot compatibility for RISC-V V extension (RVV) architectures using 128-bit VLEN configurations. While prior implementations in pull requests #2929 and #3453 established RVV support for systems with 256-bit VLEN and higher, those kernels proved incompatible with 128-bit VLEN architectures. To address this, the update implements dynamic kernel selection through runtime checks, adapts legacy kernels for 128-bit compatibility, and enhances performance of k-quant kernels for 128-bit VLEN. Additionally, the PR incorporates support for the RISC-V Zfhmin extension to accelerate float16 data type conversions.

Some k-quant kernels now use RVV 128b-optimized inline assembly to bypass compiler limitations (riscv64-linux-gnu-gcc v14.2.0), resolving inadequate and excessive register group spills when intrinsics are used. Manual assembly ensures efficient register allocation.

ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q6_K_q8_K

Verification

By running the Q2_K_L quantized model of DeepSeek-R1-Distill-Llama-8B, I've confirmed the RVV accelerated kernels are not introducing substantial numeric errors compared to the scalar implementation (RVV support disabled during compilation).

scalar rvv128 (this PR)
20.0849 +/- 0.17272 20.0669 +/- 0.17253

Performance

Performance was measured using the same model as above, on a 64-core RISC-V rv64gcv machine with 128-bit VLEN configuration.

model size params backend threads test t/s note
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 pp512 3.18 ± 0.00 scalar
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 pp512 27.19 ± 0.11 rvv128
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 tg128 2.94 ± 0.00 scalar
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 tg128 11.10 ± 0.03 rvv128

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 23, 2025
@xctan
Copy link
Contributor Author

xctan commented Mar 27, 2025

Is there any other issue with this PR? @ggerganov

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add some sort of CI for this arch in the future. If you have any ideas, let me know.

@ggerganov ggerganov merged commit 24feaec into ggml-org:master Mar 27, 2025
48 checks passed
@JocelynPanPan
Copy link

Hi there! Which version of the V extension does your server support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants