[SYCL][LIBCLC] Use half builtins for generic math functions #17163

Maetveis · 2025-02-25T15:12:37Z

__builtin_<name>() without a suffix is equivalent to the C math function <name>(), these functions take and return double values.

Instead of using these use the f16 suffixed builtins which take and return half values.

On CPU targets these will be usually lowered to
the single precision library calls (with promotion/truncation). On other targets (for example GPUs) the half builtins might be lowered directly to hardware instructions.

Some of the builtins are not available for half precision, in these cases use the f suffixed builtins (which take and return floats).

Maetveis · 2025-02-25T15:16:14Z

@PietroGhg I couldn't find anything in #13829 that would suggest that using the double builtins was intentional, but LMK if I missed something.

frasercrmck · 2025-02-25T15:37:13Z

This seems like a good enough change for now. I'll need to inspect the LLVM IR diff for myself to be sure.

However, I think this will all have to undergo some changes in the near future. libclc shouldn't be assuming support for __builtin functions by default, unless they can be shown to be supported natively on all targets (via generic codegen expansion, for example). I'm slowly adding fully-fledged half implementations upstream, and so opting out of those by default will just leave untested code. I think targets should be opting in to builtins, having the software impementations there as the default.

As it happens, I think this only "works" for DPC++ is because we only build Native CPU, AMDGPU, and NVIDIA libclc targets (ignoring the other targets upstream supports) and because AMDGPU and NVIDIA targets override these same maths functions to use their own builtins. So only Native CPU is actually going to "see" these implementations.

If this change is in fact intended for Native CPU, then I think that target should be the one choosing to use these builtins.

But like I said, we're already in this position so this change still seems like an improvement.

Maetveis · 2025-02-25T15:59:35Z

However, I think this will all have to undergo some changes in the near future. libclc shouldn't be assuming support for __builtin functions by default, unless they can be shown to be supported natively on all targets (via generic codegen expansion, for example). I'm slowly adding fully-fledged half implementations upstream, and so opting out of those by default will just leave untested code. I think targets should be opting in to builtins, having the software impementations there as the default.

Agree.

As it happens, I think this only "works" for DPC++ is because we only build Native CPU, AMDGPU, and NVIDIA libclc targets (ignoring the other targets upstream supports) and because AMDGPU and NVIDIA targets override these same maths functions to use their own builtins. So only Native CPU is actually going to "see" these implementations.

If this change is in fact intended for Native CPU, then I think that target should be the one choosing to use these builtins.

This change was in-fact motivated by an out-of-tree undisclosed target, where we are (currently) still using the generic implementations. I can send you the details in some internal channel if you wish.

But like I said, we're already in this position so this change still seems like an improvement.

Yeah I figured the same myself, we could add the right builtins for the target downstream, but this felt to me like an improvement over the current state.

libclc/libspirv/lib/generic/math/copysign.cl

frasercrmck · 2025-02-25T16:24:42Z

As it happens, I think this only "works" for DPC++ is because we only build Native CPU, AMDGPU, and NVIDIA libclc targets (ignoring the other targets upstream supports) and because AMDGPU and NVIDIA targets override these same maths functions to use their own builtins. So only Native CPU is actually going to "see" these implementations.
If this change is in fact intended for Native CPU, then I think that target should be the one choosing to use these builtins.

This change was in-fact motivated by an out-of-tree undisclosed target, where we are (currently) still using the generic implementations. I can send you the details in some internal channel if you wish.

Ah yeah I was copied into that discussion. I didn't spot that you were in that thread too. We can take the discussion offline.

But like I said, we're already in this position so this change still seems like an improvement.

Yeah I figured the same myself, we could add the right builtins for the target downstream, but this felt to me like an improvement over the current state.

It sounds as though we'll have several cases where we want to provide both software and builtin implementations in libclc, for multiple targets each. We'll have to think of the best way to make all this code available, and have targets select between what they want. There's a few mechanisms but we'll have to find the one that suits best.

PietroGhg · 2025-02-25T16:51:48Z

@PietroGhg I couldn't find anything in #13829 that would suggest that using the double builtins was intentional, but LMK if I missed something.

Yeah you are right it wasn't intentional, the changes here look good to me

Maetveis · 2025-02-26T08:54:37Z

@jchlanda @frasercrmck I rebased and force-pushed to target sycl-web to pick up #17172

Maetveis · 2025-02-26T09:33:06Z

Sorry for the noise, I thought GH is not triggering GH actions for some reason, but I guess sycl-web just doesn't have any workflows configured.

frasercrmck

Yes, sycl-web doesn't run actions. Shame I just missed the pulldown window. I suppose if you are happy to land this on sycl-web that might make sense, but if you want it pulled down quicker maybe just merge into sycl and we can resolve the copysign conflicts as and when.

`__builtin_<name>()` without a suffix is equivalent to the C math function `<name>()`, these functions take and return `double` values. Instead of using these use the `f16` suffixed builtins which take and return `half` values. On CPU targets these will be usually lowered to the single precision library calls with promotion/truncation. On other targets (for example GPUs) the half builtins might be lowered directly to hardware instructions. Some of the builtins are not available for half precision, in these cases use the `f` suffixed builtins (which take and return `float`s).

Maetveis · 2025-02-26T10:02:50Z

Yes, sycl-web doesn't run actions. Shame I just missed the pulldown window. I suppose if you are happy to land this on sycl-web that might make sense, but if you want it pulled down quicker maybe just merge into sycl and we can resolve the copysign conflicts as and when.

I am not in any hurry, but I'd prefer CI to run on this, so I changed it back to sycl, but dropped the copysign change. I think this should result in what we want, without conflicts when sycl-web is pulled down.

Maetveis · 2025-02-26T12:33:20Z

The AMD CI failure is known flaky - #17077

Maetveis · 2025-02-27T02:59:58Z

@intel/llvm-gatekeepers please merge :)

steffenlarsen · 2025-02-27T07:31:31Z

@Maetveis - parallel_for_range_roundup is failing on HIP in CI. If it is unrelated, could you please open a tracker for it and mention both the test and the tracker in a comment here?

Maetveis · 2025-02-27T08:25:17Z

@Maetveis - parallel_for_range_roundup is failing on HIP in CI. If it is unrelated, could you please open a tracker for it and mention both the test and the tracker in a comment here?

@steffenlarsen
I believe the error is that SYCL::Basic/parallel_for_range_roundup.cpp is flaky on AMD CI, it is already tracked by #17077.

This adds some missing generic address space overloads of CLC functions: frexp, and whichever functions make use of the unary_def_with_(int_)?ptr.inc files. Note that this commit doesn't map the equivalent SPIR-V functions to these CLC implementations. This is because the generic SPIR-V implementation of FP16 types currently use the clang builtins as opposed to the software implementations. As discussed in intel#17163, this is not the proper behaviour for generic implementations, as not all targets support the builtins. In future work we will need to either provide target-specific overrides for the builtin implementations, or provide an easy mechanism by which the CLC implementations can select between builtins or software implementations automatically.

This adds some missing generic address space overloads of CLC functions: frexp, and whichever functions make use of the unary_def_with_(int_)?ptr.inc files. Note that this commit doesn't map the equivalent SPIR-V functions to these CLC implementations. This is because the generic SPIR-V implementation of FP16 types currently use the clang builtins as opposed to the software implementations. As discussed in #17163, this is not the proper behaviour for generic implementations, as not all targets support the builtins. In future work we will need to either provide target-specific overrides for the builtin implementations, or provide an easy mechanism by which the CLC implementations can select between builtins or software implementations automatically.

Maetveis requested a review from a team as a code owner February 25, 2025 15:12

Maetveis requested a review from jchlanda February 25, 2025 15:12

Maetveis had a problem deploying to WindowsCILock February 25, 2025 15:13 — with GitHub Actions Failure

Maetveis requested a review from PietroGhg February 25, 2025 15:13

Maetveis temporarily deployed to WindowsCILock February 25, 2025 15:18 — with GitHub Actions Inactive

Maetveis temporarily deployed to WindowsCILock February 25, 2025 15:43 — with GitHub Actions Inactive

frasercrmck reviewed Feb 25, 2025

View reviewed changes

libclc/libspirv/lib/generic/math/copysign.cl Outdated Show resolved Hide resolved

frasercrmck mentioned this pull request Feb 25, 2025

[libspirv] Have SPIR-V copysign use CLC copysign #17172

Merged

jchlanda approved these changes Feb 26, 2025

View reviewed changes

Maetveis force-pushed the libclc_generic_use_half_builtins branch from 8fd4f59 to 6767588 Compare February 26, 2025 08:51

Maetveis requested review from a team and bader as code owners February 26, 2025 08:51

Maetveis changed the base branch from sycl to sycl-web February 26, 2025 08:51

Maetveis had a problem deploying to WindowsCILock February 26, 2025 08:51 — with GitHub Actions Error

Maetveis removed request for a team February 26, 2025 08:52

Maetveis had a problem deploying to WindowsCILock February 26, 2025 08:52 — with GitHub Actions Error

Maetveis requested review from jsji and maksimsab February 26, 2025 08:55

Maetveis force-pushed the libclc_generic_use_half_builtins branch 4 times, most recently from 7955a8b to cd4f104 Compare February 26, 2025 09:31

frasercrmck approved these changes Feb 26, 2025

View reviewed changes

Maetveis force-pushed the libclc_generic_use_half_builtins branch from cd4f104 to f6a5d01 Compare February 26, 2025 09:58

Maetveis changed the base branch from sycl-web to sycl February 26, 2025 09:59

Maetveis force-pushed the libclc_generic_use_half_builtins branch from f6a5d01 to 6875c14 Compare February 26, 2025 10:00

Maetveis temporarily deployed to WindowsCILock February 26, 2025 10:01 — with GitHub Actions Inactive

Maetveis temporarily deployed to WindowsCILock February 26, 2025 10:45 — with GitHub Actions Inactive

bader approved these changes Feb 26, 2025

View reviewed changes

martygrant merged commit 42ff9c2 into intel:sycl Feb 27, 2025
19 of 20 checks passed

Maetveis deleted the libclc_generic_use_half_builtins branch February 27, 2025 09:14

frasercrmck mentioned this pull request Feb 27, 2025

[libspirv] Restore overloads of generic address space #17214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][LIBCLC] Use half builtins for generic math functions #17163

[SYCL][LIBCLC] Use half builtins for generic math functions #17163

Maetveis commented Feb 25, 2025

Maetveis commented Feb 25, 2025

frasercrmck commented Feb 25, 2025

Maetveis commented Feb 25, 2025

frasercrmck commented Feb 25, 2025

PietroGhg commented Feb 25, 2025

Maetveis commented Feb 26, 2025

Maetveis commented Feb 26, 2025

frasercrmck left a comment

Maetveis commented Feb 26, 2025

Maetveis commented Feb 26, 2025

Maetveis commented Feb 27, 2025

steffenlarsen commented Feb 27, 2025

Maetveis commented Feb 27, 2025 •

edited

Loading

[SYCL][LIBCLC] Use half builtins for generic math functions #17163

[SYCL][LIBCLC] Use half builtins for generic math functions #17163

Conversation

Maetveis commented Feb 25, 2025

Maetveis commented Feb 25, 2025

frasercrmck commented Feb 25, 2025

Maetveis commented Feb 25, 2025

frasercrmck commented Feb 25, 2025

PietroGhg commented Feb 25, 2025

Maetveis commented Feb 26, 2025

Maetveis commented Feb 26, 2025

frasercrmck left a comment

Choose a reason for hiding this comment

Maetveis commented Feb 26, 2025

Maetveis commented Feb 26, 2025

Maetveis commented Feb 27, 2025

steffenlarsen commented Feb 27, 2025

Maetveis commented Feb 27, 2025 • edited Loading

Maetveis commented Feb 27, 2025 •

edited

Loading