[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

vladimirlaz · 2021-04-26T09:52:20Z

added processing of vectors of 3 and 16 elements;
increase local group size to fit 16-element vectors when sub-group size is 16;
tune check to avoid overlapping data ranges inside and between local groups;
increase threshold because cumulative error raises due to increase vector elements number.

SYCL/SubGroup/load_store.cpp

AlexeySachkov · 2021-04-26T10:30:13Z

SYCL/SubGroup/helper.hpp

@@ -98,7 +109,7 @@ template <typename T2> struct utils<T2, 16> {

 template <typename T> void exit_if_not_equal(T val, T ref, const char *name) {
  if (std::is_floating_point<T>::value) {
-    if (std::fabs(val - ref) > 0.01) {
+    if (std::fabs(val - ref) > 0.02) {


What was the reason to increase the threshold?

What was the reason to increase the threshold?

During data verification all elements of vectors are added and this value is compared with reference one. If we increase number of added elements twice potential cumulative error is increased twice.

Won't it be better to switch to validating relative error instead of absolute one?

The test does not target specific accuracy goals. It checks that the return values are not something completely different from expected.
Does it make sense to invest in tuning accuracy of the test?

I suggest we use input data, so that results have no error at all.

@bader, could you, please, describe a bit? I think I didn't get it.

During data verification all elements of vectors are added and this value is compared with reference one.

When you do FP addition, the computed result is rounded to fit into resulting data type. The error occurs only if add result can't be exactly preserved and has to be rounded. E.g. T == float, 2^{20} + 2^{-10} can't be represented "exactly", but 2^{20} + 2^{21} can. I suggest using input values, so that rounding error will be 0, so you can always use exact match.

BTW, using std::fabs here reduces accuracy for T == double.

Implemented bitwise comparison from floating point type except half.
Valid range for half type is too narrow and will require to rework data for several tests. This is out of scope for current PR. But aligning input data for current test allow to revert threshold increase.

vladimirlaz · 2021-04-26T11:10:40Z

The test will start to pass once intel/llvm#3617 submitted

…f skipping the whole test when CPU device is available on host machine

s-kanaev · 2021-04-26T11:47:20Z

SYCL/SubGroup/helper.hpp

@@ -98,7 +109,7 @@ template <typename T2> struct utils<T2, 16> {

 template <typename T> void exit_if_not_equal(T val, T ref, const char *name) {
  if (std::is_floating_point<T>::value) {
-    if (std::fabs(val - ref) > 0.01) {
+    if (std::fabs(val - ref) > 0.02) {


Won't it be better to switch to validating relative error instead of absolute one?

SYCL/SubGroup/helper.hpp

- when 3-element vector is passed it is packed as single element and 2 element vector; - when 16-element vector is passed it is packed as 2 sequential 8-element vectors. The test has changed in scope of intel/llvm-test-suite#253

…tors (intel/llvm-test-suite#253) - Add processing of vectors of 3 and 16 elements. - Increase local group size to fit 16-element vectors when sub-group size is 16. - Tune check to avoid overlapping data ranges inside and between local groups. - Implemented bitwise comparison for floating point types except half. Valid range for half type is too narrow and will require to rework data for several tests. This is out of scope for current PR. .

vladimirlaz added 4 commits April 26, 2021 08:34

[SYCL] SG load/store for vec3 and vec16

ea29d71

Disable device-code-split which is not supported by CUDA BE

45e1261

Fix failure due to limitations on GPU

86b83d5

Align data ranges to avoid limitting max_sg_size

b4bba1e

vladimirlaz requested review from AlexeySachkov and Pennycook as code owners April 26, 2021 09:52

vladimirlaz mentioned this pull request Apr 26, 2021

[SYCL] Support 3-, 16-elements vectors in SG load/store intel/llvm#3617

Merged

AlexeySachkov reviewed Apr 26, 2021

View reviewed changes

Apply review comments and skip execution on CPU device only instead o…

3cbb081

…f skipping the whole test when CPU device is available on host machine

vladimirlaz requested a review from AlexeySachkov April 26, 2021 11:17

s-kanaev reviewed Apr 26, 2021

View reviewed changes

vladimirlaz requested a review from s-kanaev April 26, 2021 12:46

Use bitwise comparison for floating point type

0c34ec8

vladimirlaz requested a review from bader May 5, 2021 12:21

Fix clang-format

afcd3fd

Pennycook approved these changes May 5, 2021

View reviewed changes

vladimirlaz merged commit c78a789 into intel:intel May 5, 2021

vladimirlaz deleted the sg_load_store_v16_v3 branch May 5, 2021 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

vladimirlaz commented Apr 26, 2021

AlexeySachkov Apr 26, 2021

vladimirlaz Apr 26, 2021

s-kanaev Apr 26, 2021

vladimirlaz Apr 26, 2021

bader Apr 27, 2021

s-kanaev Apr 28, 2021

bader Apr 28, 2021

vladimirlaz May 5, 2021

vladimirlaz commented Apr 26, 2021

s-kanaev Apr 26, 2021

[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

[SYCL] Extend sub-group load/store tests to cover 3-, 16-elements vectors #253

Conversation

vladimirlaz commented Apr 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladimirlaz commented Apr 26, 2021

Choose a reason for hiding this comment