Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

[SYCL] Add test cases for SYCL2020 reductions #194

Merged
merged 4 commits into from
Apr 8, 2021

Conversation

v-klochkov
Copy link

@v-klochkov v-klochkov commented Mar 25, 2021

The corresponding changes in SYCL RT:
intel/llvm#3410
intel/llvm#3481

This patch

  • adds checks for SYCL-2020 reductions with initialize_to_identity property (#3410)
  • enables the test reduction_nd_N_vars.cpp for level-zero (#3481)
  • reduces some of tests by eliminating unneeded/duplicated checks.
  • removes the test reduction_transparent.cpp as useless because transparent
    operators already checked in many other tests.
  • reduces runtime of reduction tests by about 200x in average by re-using
    queue created once instead of creating it in each of test-cases

Signed-off-by: Vyacheslav N Klochkov [email protected]

@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch 2 times, most recently from a025530 to c623e5c Compare March 25, 2021 17:26
v-klochkov added a commit to v-klochkov/llvm that referenced this pull request Mar 26, 2021
…property

The corresponding LIT tests PR: intel/llvm-test-suite#194

SYCL2020 reductions use dynamic information (instead of static info used
previously) to ask to do discard-write to user's reduction variable or USM
memory. That caused big changes removing/re-structuring the code that
previously relied on accessor-mode (read-write vs discard-write).

With this patch the SYCL-2020 reductions become on par with ONEAPI::reduction

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
v-klochkov added a commit to intel/llvm that referenced this pull request Mar 31, 2021
…#3410)

* [SYCL] Implement SYCL2020 reductions with set initialize_to_identity property

The corresponding LIT tests PR: intel/llvm-test-suite#194

SYCL2020 reductions use dynamic information (instead of static info used
previously) to ask to do discard-write to user's reduction variable or USM
memory. That caused big changes removing/re-structuring the code that
previously relied on accessor-mode (read-write vs discard-write).

With this patch the SYCL-2020 reductions become on par with ONEAPI::reduction

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch 2 times, most recently from 2b1d0de to 5110a10 Compare April 3, 2021 20:39
@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch from 5110a10 to 5a203cb Compare April 3, 2021 20:45
v-klochkov added a commit to intel/llvm that referenced this pull request Apr 5, 2021
Using sycl::detail::tuple is a temporary work-around for various
problems caused by using std::tuple:
a) reduction using std::tuple cannot be compiled on Windows because
   std::tuple cannot be copied to DEVICE.
b) internal error in level_zero RT.

The new sycl::detail::tuple class is a very limited version
of oneDPL's implementation of tuple. It includes such functionality:
- convert from std::tuple and to std::tuple
- tie(), get<I>(), tuple_element, make_tuple

This change enables parallel_for() with number of reductions
more than 1 for level_zero and for Windows.

The corresponding changes in LIT tests: intel/llvm-test-suite#194

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch from 5a203cb to 53f6067 Compare April 5, 2021 16:50
@v-klochkov v-klochkov marked this pull request as ready for review April 5, 2021 17:17
@v-klochkov v-klochkov requested a review from vladimirlaz April 5, 2021 17:17
@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch 3 times, most recently from 725ad13 to 0d95b4d Compare April 6, 2021 17:22
@v-klochkov v-klochkov force-pushed the public_vklochkov_reduction_2020 branch from 0d95b4d to f53c4a4 Compare April 7, 2021 23:16
@v-klochkov v-klochkov requested review from rdeodhar and againull April 8, 2021 00:14
@v-klochkov
Copy link
Author

This change is supposed to end annoying flaky reduction test fails in CI due to timeouts.

@rdeodhar
Copy link

rdeodhar commented Apr 8, 2021

The change to use a queue created in "main" looks good to me.

Regarding flakiness on the CPU device, could this message hold a clue for difference in behavior across machines?
"The implementation handling parallel_for with reduction requires work group size not bigger than 1"

@v-klochkov
Copy link
Author

v-klochkov commented Apr 8, 2021

The change to use a queue created in "main" looks good to me.

Regarding flakiness on the CPU device, could this message hold a clue for difference in behavior across machines?
"The implementation handling parallel_for with reduction requires work group size not bigger than 1"

I see different (unexpected) output results. Combining that with diagnostics shown by you, it may be some problem with pointers, that I am going to debug/fix. That is not expected for ONEAPI::detail::reduGetMaxWGSize() to return 1 for those simple cases in reduction_nd_N_vars.cpp.

Copy link

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as a side note: it looks like there are too much different changes for one review. As far as I remember ideal PRs are < 300 lines. And in this case it even looks like merged changes from different PRs. I don't insist on splitting it, just a note.

@@ -100,8 +97,9 @@ int testOne(T1 IdentityVal1, T1 InitVal1, BinaryOperation1 BOp1,
CorrectOut2 = BOp2(CorrectOut2, InitVal2);
if (Mode3 == access::mode::read_write)
CorrectOut3 = BOp3(CorrectOut3, InitVal3);
// 4th reduction is USM and this is read_write.
CorrectOut4 = BOp4(CorrectOut4, InitVal4);
// discard_write mode for USM reductions is available only SYCL2020.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot match the comment to the code. I see USM but I don't see discard_write. Am I missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONEAPI::reduction if initialized with USM pointer assumed read-write to that USM memory. I.e. add the original value of element pointed by USM memory in the final sum. Only SYCL-2020 can ignore the original value of USM memory (i.e. do discard_write).

The test may have 'Mode4' be equal to access::mode::discard_write, which is treated as 'read_write' if 'IsSYCL2020==false'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment could be this:
// ONEAPI::reduction supports only read_write access to USM memory

@@ -1,5 +1,4 @@
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out
// RUNx: %HOST_RUN_PLACEHOLDER %t.out
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this feature device-only? Or are we going to add support for HOST in the future? If the second, probably it makes sense to keep this line or add todo to not forget to enable it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running on host would require adding support for barrier() and group algorithm reduce(). That is a known issue that cannot be forgotten. Also several other tests already mention that (e.g. reduction_placeholder.cpp, reduction_queue_parallel_for.cpp )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can return that line if you think it is useful to have it there.

@v-klochkov
Copy link
Author

v-klochkov commented Apr 8, 2021

Also as a side note: it looks like there are too much different changes for one review.

Yes, I know, sorry. That is not typical situation.
The LIT changes for intel/llvm#3481 is removal of few lines disabling reduction_nd_N_vars.cpp for Windows and for level_zero on Linux.

The original change for intel/llvm#3410 was smaller than that, but it did not pass several times without additional changes reducing tests and re-using queue. That is why it is so big now.

Copy link

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have any major concerns, LGTM

@v-klochkov v-klochkov merged commit c38e874 into intel:intel Apr 8, 2021
@v-klochkov v-klochkov deleted the public_vklochkov_reduction_2020 branch April 8, 2021 18:51
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023
)

* [SYCL] Add test cases for SYCL2020 reductions + initialize_to_identity property

The corresponding changes in SYCL RT:
  intel#3410
  intel#3481

This patch
- adds checks for SYCL-2020 reductions with initialize_to_identity property (intel/llvm-test-suite#3410)
- enables the test reduction_nd_N_vars.cpp for level-zero (intel/llvm-test-suite#3481)
- reduces some of tests by eliminating unneeded/duplicated checks.
- removes the test reduction_transparent.cpp as useless because transparent 
  operators already checked in many other tests.
- reduces runtime of reduction tests by about 200x in average by re-using
  queue created once instead of creating it in each of test-cases

Signed-off-by: Vyacheslav N Klochkov <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants