-
Notifications
You must be signed in to change notification settings - Fork 131
[SYCL] Add test cases for SYCL2020 reductions #194
[SYCL] Add test cases for SYCL2020 reductions #194
Conversation
a025530
to
c623e5c
Compare
…property The corresponding LIT tests PR: intel/llvm-test-suite#194 SYCL2020 reductions use dynamic information (instead of static info used previously) to ask to do discard-write to user's reduction variable or USM memory. That caused big changes removing/re-structuring the code that previously relied on accessor-mode (read-write vs discard-write). With this patch the SYCL-2020 reductions become on par with ONEAPI::reduction Signed-off-by: Vyacheslav N Klochkov <[email protected]>
…#3410) * [SYCL] Implement SYCL2020 reductions with set initialize_to_identity property The corresponding LIT tests PR: intel/llvm-test-suite#194 SYCL2020 reductions use dynamic information (instead of static info used previously) to ask to do discard-write to user's reduction variable or USM memory. That caused big changes removing/re-structuring the code that previously relied on accessor-mode (read-write vs discard-write). With this patch the SYCL-2020 reductions become on par with ONEAPI::reduction Signed-off-by: Vyacheslav N Klochkov <[email protected]>
2b1d0de
to
5110a10
Compare
5110a10
to
5a203cb
Compare
Using sycl::detail::tuple is a temporary work-around for various problems caused by using std::tuple: a) reduction using std::tuple cannot be compiled on Windows because std::tuple cannot be copied to DEVICE. b) internal error in level_zero RT. The new sycl::detail::tuple class is a very limited version of oneDPL's implementation of tuple. It includes such functionality: - convert from std::tuple and to std::tuple - tie(), get<I>(), tuple_element, make_tuple This change enables parallel_for() with number of reductions more than 1 for level_zero and for Windows. The corresponding changes in LIT tests: intel/llvm-test-suite#194 Signed-off-by: Vyacheslav N Klochkov <[email protected]>
5a203cb
to
53f6067
Compare
725ad13
to
0d95b4d
Compare
…y property The corresponding changes in SYCL RT: intel/llvm#3410 intel/llvm#3481 Signed-off-by: Vyacheslav N Klochkov <[email protected]>
Signed-off-by: Vyacheslav N Klochkov <[email protected]>
0d95b4d
to
f53c4a4
Compare
Signed-off-by: Vyacheslav N Klochkov <[email protected]>
This change is supposed to end annoying flaky reduction test fails in CI due to timeouts. |
The change to use a queue created in "main" looks good to me. Regarding flakiness on the CPU device, could this message hold a clue for difference in behavior across machines? |
I see different (unexpected) output results. Combining that with diagnostics shown by you, it may be some problem with pointers, that I am going to debug/fix. That is not expected for ONEAPI::detail::reduGetMaxWGSize() to return 1 for those simple cases in reduction_nd_N_vars.cpp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as a side note: it looks like there are too much different changes for one review. As far as I remember ideal PRs are < 300 lines. And in this case it even looks like merged changes from different PRs. I don't insist on splitting it, just a note.
@@ -100,8 +97,9 @@ int testOne(T1 IdentityVal1, T1 InitVal1, BinaryOperation1 BOp1, | |||
CorrectOut2 = BOp2(CorrectOut2, InitVal2); | |||
if (Mode3 == access::mode::read_write) | |||
CorrectOut3 = BOp3(CorrectOut3, InitVal3); | |||
// 4th reduction is USM and this is read_write. | |||
CorrectOut4 = BOp4(CorrectOut4, InitVal4); | |||
// discard_write mode for USM reductions is available only SYCL2020. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot match the comment to the code. I see USM but I don't see discard_write. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ONEAPI::reduction if initialized with USM pointer assumed read-write to that USM memory. I.e. add the original value of element pointed by USM memory in the final sum. Only SYCL-2020 can ignore the original value of USM memory (i.e. do discard_write).
The test may have 'Mode4' be equal to access::mode::discard_write, which is treated as 'read_write' if 'IsSYCL2020==false'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment could be this:
// ONEAPI::reduction supports only read_write access to USM memory
@@ -1,5 +1,4 @@ | |||
// RUN: %clangxx -fsycl -fsycl-targets=%sycl_triple %s -o %t.out | |||
// RUNx: %HOST_RUN_PLACEHOLDER %t.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature device-only? Or are we going to add support for HOST in the future? If the second, probably it makes sense to keep this line or add todo to not forget to enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running on host would require adding support for barrier() and group algorithm reduce(). That is a known issue that cannot be forgotten. Also several other tests already mention that (e.g. reduction_placeholder.cpp, reduction_queue_parallel_for.cpp )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can return that line if you think it is useful to have it there.
Yes, I know, sorry. That is not typical situation. The original change for intel/llvm#3410 was smaller than that, but it did not pass several times without additional changes reducing tests and re-using queue. That is why it is so big now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have any major concerns, LGTM
) * [SYCL] Add test cases for SYCL2020 reductions + initialize_to_identity property The corresponding changes in SYCL RT: intel#3410 intel#3481 This patch - adds checks for SYCL-2020 reductions with initialize_to_identity property (intel/llvm-test-suite#3410) - enables the test reduction_nd_N_vars.cpp for level-zero (intel/llvm-test-suite#3481) - reduces some of tests by eliminating unneeded/duplicated checks. - removes the test reduction_transparent.cpp as useless because transparent operators already checked in many other tests. - reduces runtime of reduction tests by about 200x in average by re-using queue created once instead of creating it in each of test-cases Signed-off-by: Vyacheslav N Klochkov <[email protected]>
The corresponding changes in SYCL RT:
intel/llvm#3410
intel/llvm#3481
This patch
operators already checked in many other tests.
queue created once instead of creating it in each of test-cases
Signed-off-by: Vyacheslav N Klochkov [email protected]