SYCL: Add fp16 type support to unary op kernels #12788

qnixsynapse · 2025-04-07T05:50:27Z

There are probably better ways to do this.

Need to disable fp16 support on devices which does not support fp16 in hardware.

Either we do this by checking if the build is compiled with GGML_SYCL_F16 compile flag and disable it in device_supports_op function or we add info about current hardware features and check using a function.

Need proper testing..

NeoZhangJianyu · 2025-04-07T07:13:50Z

I find there is no UT cases for FP16 be opened.
Could you change the UT cases and enable FP16 for the OPs?
And test them and make sure they are passed.

qnixsynapse · 2025-04-07T07:18:52Z

I find there is no UT cases for FP16 be opened. Could you change the UT cases and enable FP16 for the OPs? And test them and make sure they are passed.

I think there are UT cases that are present in test backend ops which was disabled at that time by me (in #12201 )

NeoZhangJianyu · 2025-04-08T02:12:20Z

OK, I suggest enabling them and use them test this PR.

qnixsynapse · 2025-04-08T04:32:30Z

OK, I suggest enabling them and use them test this PR.

Already enabled and tested.

qnixsynapse · 2025-04-08T05:34:06Z

It seems that in actual inference of a fp16 model(gemma 2 2B F16 in this case), the intermediate hidden embeddings are converted to fp32:

call ggml_sycl_add done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f16
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f16
call ggml_sycl_tanh: DST Tensor type: f32 <-----------------------
call ggml_sycl_tanh done
ggml_sycl_op_soft_max: F32 mask
[SYCL] call ggml_sycl_dup
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f32
[SYCL] call ggml_sycl_dup done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_add
call ggml_sycl_add done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_gelu: DST Tensor type: f32 <------------------------
call ggml_sycl_gelu done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_add
call ggml_sycl_add done

So, there is no way to test the numerical stability of the fp16 operations with the exception of test-backend-ops:


  GELU(type=f16,ne_a=[128,2,2,2],v=0): call ggml_sycl_gelu: DST Tensor type: f16
call ggml_sycl_gelu done
OK
  GELU(type=f16,ne_a=[5,7,11,13],v=0): call ggml_sycl_gelu: DST Tensor type: f16
call ggml_sycl_gelu done
OK
  GELU(type=f32,ne_a=[128,2,2,2],v=0): call ggml_sycl_gelu: DST Tensor type: f32
call ggml_sycl_gelu done
OK
  GELU(type=f32,ne_a=[5,7,11,13],v=0): call ggml_sycl_gelu: DST Tensor type: f32
call ggml_sycl_gelu done
OK
TANH(type=f16,ne_a=[128,2,2,2],v=0): call ggml_sycl_tanh: DST Tensor type: f16
call ggml_sycl_tanh done
OK
  TANH(type=f16,ne_a=[5,7,11,13],v=0): call ggml_sycl_tanh: DST Tensor type: f16
call ggml_sycl_tanh done
OK
  TANH(type=f32,ne_a=[128,2,2,2],v=0): call ggml_sycl_tanh: DST Tensor type: f32
call ggml_sycl_tanh done
OK
  TANH(type=f32,ne_a=[5,7,11,13],v=0): call ggml_sycl_tanh: DST Tensor type: f32
call ggml_sycl_tanh done
OK

I am marking this PR "ready for review" for now to get some comments from others.

ggml/src/ggml-sycl/element_wise.cpp

NeoZhangJianyu

It's OK!

Thank you!

qnixsynapse · 2025-04-09T01:59:18Z

It's OK!

Thank you!

Thank you! Lets wait for other's comment before we merge it.

Rbiessy

Either we do this by checking if the build is compiled with GGML_SYCL_F16 compile flag and disable it in device_supports_op function or we add info about current hardware features and check using a function.

Did we reach a conclusion on this question? I think we should not instantiate fp16 kernels if the user does not provide GGML_SYCL_F16.

ggml/src/ggml-sycl/element_wise.cpp

ggml/src/ggml-sycl/ggml-sycl.cpp

ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 7, 2025

qnixsynapse requested review from Rbiessy, Alcpz and NeoZhangJianyu April 7, 2025 06:32

qnixsynapse marked this pull request as draft April 7, 2025 07:35

qnixsynapse marked this pull request as ready for review April 8, 2025 05:34

AD2605 reviewed Apr 8, 2025

View reviewed changes

NeoZhangJianyu reviewed Apr 9, 2025

View reviewed changes

ggml/src/ggml-sycl/element_wise.cpp Show resolved Hide resolved

ggml/src/ggml-sycl/element_wise.cpp Show resolved Hide resolved

NeoZhangJianyu approved these changes Apr 9, 2025

View reviewed changes

qnixsynapse force-pushed the sycl/fp16elewisesupport branch from 9553c5b to fc8d0a6 Compare April 9, 2025 02:07

Rbiessy reviewed Apr 9, 2025

View reviewed changes

ggml/src/ggml-sycl/element_wise.cpp Outdated Show resolved Hide resolved

Rbiessy reviewed Apr 9, 2025

View reviewed changes

ggml/src/ggml-sycl/element_wise.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/ggml-sycl.cpp Show resolved Hide resolved

Rbiessy approved these changes Apr 10, 2025

View reviewed changes

SYCL: Add fp16 support to some elementwise OP kernels

4c61c27

qnixsynapse force-pushed the sycl/fp16elewisesupport branch from eed23cd to 8398060 Compare April 11, 2025 03:25

qnixsynapse added 6 commits April 11, 2025 08:55

remove comment

8f89e13

ggml-ci

Use static_cast directly

0219e20

remove not needed cast from tanh

ecc5eaf

Use static cast and remove unneeded castings

1291465

Adjust device_support_op for unary OPs

f6a7c5f

Use cast_data and typed_data struct to deduplicate casting code

8398060

NeoZhangJianyu merged commit fccf9ca into master Apr 11, 2025
53 checks passed

qnixsynapse deleted the sycl/fp16elewisesupport branch April 11, 2025 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: Add fp16 type support to unary op kernels #12788

SYCL: Add fp16 type support to unary op kernels #12788

qnixsynapse commented Apr 7, 2025 •

edited

Loading

NeoZhangJianyu commented Apr 7, 2025

qnixsynapse commented Apr 7, 2025

NeoZhangJianyu commented Apr 8, 2025

qnixsynapse commented Apr 8, 2025

qnixsynapse commented Apr 8, 2025 •

edited

Loading

NeoZhangJianyu left a comment

qnixsynapse commented Apr 9, 2025

Rbiessy left a comment

SYCL: Add fp16 type support to unary op kernels #12788

SYCL: Add fp16 type support to unary op kernels #12788

Conversation

qnixsynapse commented Apr 7, 2025 • edited Loading

NeoZhangJianyu commented Apr 7, 2025

qnixsynapse commented Apr 7, 2025

NeoZhangJianyu commented Apr 8, 2025

qnixsynapse commented Apr 8, 2025

qnixsynapse commented Apr 8, 2025 • edited Loading

NeoZhangJianyu left a comment

Choose a reason for hiding this comment

qnixsynapse commented Apr 9, 2025

Rbiessy left a comment

Choose a reason for hiding this comment

qnixsynapse commented Apr 7, 2025 •

edited

Loading

qnixsynapse commented Apr 8, 2025 •

edited

Loading