We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug Calling joint_inclusive_scan with sycl::multiplies returns the wrong result.
joint_inclusive_scan
sycl::multiplies
To Reproduce
Small code that performs a joint_inclusive_scan on a vector of 32 ones.
#include <sycl/sycl.hpp> int main() { sycl::queue q{sycl::gpu_selector{}}; constexpr size_t size = 32; using T = uint; using op = sycl::multiplies<>; std::vector<T> v(size, T(1)); std::vector<T> out(size, T(1)); { sycl::buffer<T, 1> v_buffer(v.data(), sycl::range<1>(v.size())); sycl::buffer<T, 1> out_buffer(out.data(), sycl::range<1>(out.size())); q.submit([&](sycl::handler &cgh) { auto in = v_buffer.get_access<sycl::access::mode::read>(cgh); auto out = out_buffer.get_access<sycl::access::mode::write>(cgh); cgh.parallel_for(sycl::nd_range<1>(32, 32), [=](sycl::nd_item<1> it) { auto begin = in.get_pointer(); sycl::joint_inclusive_scan(it.get_group(), begin, begin + in.size(), out.get_pointer(), op()); }); }); } for (const T &e: out) { std::cout << e << " "; } std::cout << std::endl; }
We should expect a vector of 1, but we get:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
(and why do we have a one at the end? I suspect a flip somewhere)
If we change the binary operator to sycl::plus<>, the code works as expected and prints:
sycl::plus<>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
For sycl::plus<>, the result is right, but for multiplies if won't compile and fails with:
multiplies
Unsupported SPIR-V module SPIRV module requires unsupported capability 63 Compilation failed -11 (CL_BUILD_PROGRAM_FAILURE)
If someone could try the reproducer on openCL it could be nice!
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered:
Okay, it could be helpful, if I set the initial value to 3, I get, on the cuda back-end:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
How did it got here? 🤔
Sorry, something went wrong.
Solved on CUDA!
[SYCL][libclc] Fix identity for multiplication (#4337)
a6447ca
This resolves #4336 issue which is a bug related to 0 being used as the identity in the CUDA back-end. Signed-off-by: Michel Migdal [email protected]
Successfully merging a pull request may close this issue.
Describe the bug
Calling
joint_inclusive_scan
withsycl::multiplies
returns the wrong result.To Reproduce
Small code that performs a joint_inclusive_scan on a vector of 32 ones.
Results with the CUDA back-end
We should expect a vector of 1, but we get:
(and why do we have a one at the end? I suspect a flip somewhere)
If we change the binary operator to
sycl::plus<>
, the code works as expected and prints:Results on openCL
For
sycl::plus<>
, the result is right, but formultiplies
if won't compile and fails with:If someone could try the reproducer on openCL it could be nice!
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: