[SYCL] Add tests for bf16 builtins operating on storage types #897

t4c1 · 2022-03-07T11:28:42Z

Add tests for bf16 builtins operating on storage types. Tests changes from intel/llvm#5748.

bader · 2022-03-14T09:58:27Z

SYCL/BFloat16/bf16_storage_builtins.cpp

Co-authored-by: Alexey Bader <[email protected]>

MrSidims

In general LGTM.
Do we have bf16 aspect implemented? If yes, can it be incorporated in the test? For example we can launch the test on GPU in general, but if device doesn't support the mentioned aspect - we do an early exit.

Add bf16 builtins operating on storage types. Partially implements https://github.com/intel/llvm/pull/5645/files for CUDA (only operations on storage types). This PR includes a bugfix for some NVPTX intrinsics, which will also be pushed upstream. Tests for this are in intel/llvm-test-suite#897.

t4c1 · 2022-03-15T08:43:45Z

Not yet, but it is being added as a part of intel/llvm#5720.

t4c1 · 2022-03-28T08:24:11Z

Are any of the failed tests here something that could have been introduced by this PR?

requires intel/llvm#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel/llvm#5964, using the tests from #897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>

This PR introduces full support of element wise operations in the cuda backend. `wi_data`, `get_matrix_fill`, and `joint_matrix.get_wi_data()` are introduced for portability with the Intel backend. In addition, in the CUDA backend users can call `joint_matrix.wi_marray` to access the marray that stores the WI owned elements of the matrix and perform optimized element wise operations using math functions that take marrays. bfloat16 element wise operations support is also included and this PR adds bfloat16 scalar/marray impls replacing the existing uint16_t "storage type" implementations for fma, fmax, fmin, and fabs math functions. The bfloat16 fma_relu function impl has now been added directly in #5749. The existing temporary uint16_t implementations (introduced in #5748 with unmerged tests intel/llvm-test-suite#897) have been removed, since these bfloat16 implementations replaces them. Signed-off-by: jack.kirk <[email protected]>

…el/llvm-test-suite#975) requires intel#5964 bfloat16_builtins.cpp covers the bfloat16 scalar math function cases introduced by intel#5964, using the tests from intel/llvm-test-suite#897 (that cover all "storage type" uint16_t impl cases). elem_wise_all_ops_cuda.cpp covers the portable elem wise ops using `wi_data`. Since CUDA does not support `joint_matrix_store` for certain data types that are only used in a/b type matrices, such as bfloat16 and int8, it is necessary to perform a `joint_matrix_mad` operation and then call `joint_matrix_store` on the accumulator matrix in order the reach the host code check. Intel backend devices could still use this test in the future provided that a backend check is introduced. Ideally both backends could eventually use the same test code. Signed-off-by: jack.kirk <[email protected]>

[SYCL] add tests for bf16 builtins operating on storage types

7e38af5

t4c1 requested review from AlexeySotkin and a team as code owners March 7, 2022 11:28

t4c1 mentioned this pull request Mar 7, 2022

[SYCL][CUDA] Add bf16 builtins operating on storage types intel/llvm#5748

Merged

bader requested review from MrSidims and removed request for AlexeySotkin March 14, 2022 09:57

bader reviewed Mar 14, 2022

View reviewed changes

SYCL/BFloat16/bf16_storage_builtins.cpp Outdated Show resolved Hide resolved

SYCL/BFloat16/bf16_storage_builtins.cpp Outdated Show resolved Hide resolved

t4c1 and others added 2 commits March 14, 2022 15:49

Apply suggestions from code review

14e6b59

Co-authored-by: Alexey Bader <[email protected]>

format

1c0d632

MrSidims reviewed Mar 14, 2022

View reviewed changes

JackAKirk mentioned this pull request Apr 5, 2022

[SYCL][CUDA] Test cases for bfloat16 math/elem wise joint_matrix #975

Merged

JackAKirk mentioned this pull request Jun 27, 2022

[SYCL][CUDA] Joint_matrix elem wise ops inc bfloat16 intel/llvm#5964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Add tests for bf16 builtins operating on storage types #897

[SYCL] Add tests for bf16 builtins operating on storage types #897

t4c1 commented Mar 7, 2022

bader commented Mar 14, 2022

MrSidims left a comment

t4c1 commented Mar 15, 2022

t4c1 commented Mar 28, 2022

[SYCL] Add tests for bf16 builtins operating on storage types #897

Are you sure you want to change the base?

[SYCL] Add tests for bf16 builtins operating on storage types #897

Conversation

t4c1 commented Mar 7, 2022

bader commented Mar 14, 2022

MrSidims left a comment

Choose a reason for hiding this comment

t4c1 commented Mar 15, 2022

t4c1 commented Mar 28, 2022