Skip to content

SYCL: Add fp16 type support to unary op kernels #12788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 11, 2025

Conversation

qnixsynapse
Copy link
Collaborator

@qnixsynapse qnixsynapse commented Apr 7, 2025

There are probably better ways to do this.

Need to disable fp16 support on devices which does not support fp16 in hardware.

Either we do this by checking if the build is compiled with GGML_SYCL_F16 compile flag and disable it in device_supports_op function or we add info about current hardware features and check using a function.

Need proper testing..

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 7, 2025
@NeoZhangJianyu
Copy link
Collaborator

I find there is no UT cases for FP16 be opened.
Could you change the UT cases and enable FP16 for the OPs?
And test them and make sure they are passed.

@qnixsynapse
Copy link
Collaborator Author

I find there is no UT cases for FP16 be opened. Could you change the UT cases and enable FP16 for the OPs? And test them and make sure they are passed.

I think there are UT cases that are present in test backend ops which was disabled at that time by me (in #12201 )

@qnixsynapse qnixsynapse marked this pull request as draft April 7, 2025 07:35
@NeoZhangJianyu
Copy link
Collaborator

OK, I suggest enabling them and use them test this PR.

@qnixsynapse
Copy link
Collaborator Author

OK, I suggest enabling them and use them test this PR.

Already enabled and tested.

@qnixsynapse
Copy link
Collaborator Author

qnixsynapse commented Apr 8, 2025

It seems that in actual inference of a fp16 model(gemma 2 2B F16 in this case), the intermediate hidden embeddings are converted to fp32:

call ggml_sycl_add done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f16
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f16
call ggml_sycl_tanh: DST Tensor type: f32 <-----------------------
call ggml_sycl_tanh done
ggml_sycl_op_soft_max: F32 mask
[SYCL] call ggml_sycl_dup
[SYCL] ggml_sycl_cpy: Tensor supplied: f32 to f32
[SYCL] call ggml_sycl_dup done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_add
call ggml_sycl_add done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_gelu: DST Tensor type: f32 <------------------------
call ggml_sycl_gelu done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_rms_norm
call ggml_sycl_rms_norm done
call ggml_sycl_mul
call ggml_sycl_mul done
call ggml_sycl_add
call ggml_sycl_add done

So, there is no way to test the numerical stability of the fp16 operations with the exception of test-backend-ops:


  GELU(type=f16,ne_a=[128,2,2,2],v=0): call ggml_sycl_gelu: DST Tensor type: f16
call ggml_sycl_gelu done
OK
  GELU(type=f16,ne_a=[5,7,11,13],v=0): call ggml_sycl_gelu: DST Tensor type: f16
call ggml_sycl_gelu done
OK
  GELU(type=f32,ne_a=[128,2,2,2],v=0): call ggml_sycl_gelu: DST Tensor type: f32
call ggml_sycl_gelu done
OK
  GELU(type=f32,ne_a=[5,7,11,13],v=0): call ggml_sycl_gelu: DST Tensor type: f32
call ggml_sycl_gelu done
OK
TANH(type=f16,ne_a=[128,2,2,2],v=0): call ggml_sycl_tanh: DST Tensor type: f16
call ggml_sycl_tanh done
OK
  TANH(type=f16,ne_a=[5,7,11,13],v=0): call ggml_sycl_tanh: DST Tensor type: f16
call ggml_sycl_tanh done
OK
  TANH(type=f32,ne_a=[128,2,2,2],v=0): call ggml_sycl_tanh: DST Tensor type: f32
call ggml_sycl_tanh done
OK
  TANH(type=f32,ne_a=[5,7,11,13],v=0): call ggml_sycl_tanh: DST Tensor type: f32
call ggml_sycl_tanh done
OK

I am marking this PR "ready for review" for now to get some comments from others.

@qnixsynapse qnixsynapse marked this pull request as ready for review April 8, 2025 05:34
Copy link
Collaborator

@NeoZhangJianyu NeoZhangJianyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK!

Thank you!

@qnixsynapse
Copy link
Collaborator Author

It's OK!

Thank you!

Thank you! Lets wait for other's comment before we merge it.

@qnixsynapse qnixsynapse force-pushed the sycl/fp16elewisesupport branch from 9553c5b to fc8d0a6 Compare April 9, 2025 02:07
Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we do this by checking if the build is compiled with GGML_SYCL_F16 compile flag and disable it in device_supports_op function or we add info about current hardware features and check using a function.

Did we reach a conclusion on this question? I think we should not instantiate fp16 kernels if the user does not provide GGML_SYCL_F16.

@qnixsynapse qnixsynapse force-pushed the sycl/fp16elewisesupport branch from eed23cd to 8398060 Compare April 11, 2025 03:25
@NeoZhangJianyu NeoZhangJianyu merged commit fccf9ca into master Apr 11, 2025
53 checks passed
@qnixsynapse qnixsynapse deleted the sycl/fp16elewisesupport branch April 11, 2025 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants