Joint_inclusive_scan returns the wrong result #4336

Michoumichmich · 2021-08-13T21:41:18Z

Describe the bug
Calling joint_inclusive_scan with sycl::multiplies returns the wrong result.

To Reproduce

Small code that performs a joint_inclusive_scan on a vector of 32 ones.

#include <sycl/sycl.hpp>

int main() {
    sycl::queue q{sycl::gpu_selector{}};

    constexpr size_t size = 32;
    using T = uint;
    using op = sycl::multiplies<>;

    std::vector<T> v(size, T(1));
    std::vector<T> out(size, T(1));
    {
        sycl::buffer<T, 1> v_buffer(v.data(), sycl::range<1>(v.size()));
        sycl::buffer<T, 1> out_buffer(out.data(), sycl::range<1>(out.size()));
        q.submit([&](sycl::handler &cgh) {
            auto in = v_buffer.get_access<sycl::access::mode::read>(cgh);
            auto out = out_buffer.get_access<sycl::access::mode::write>(cgh);
            cgh.parallel_for(sycl::nd_range<1>(32, 32), [=](sycl::nd_item<1> it) {
                auto begin = in.get_pointer();
                sycl::joint_inclusive_scan(it.get_group(), begin, begin + in.size(), out.get_pointer(), op());
            });
        });
    }

    for (const T &e: out) {
        std::cout << e << " ";
    }
    std::cout << std::endl;
}

Results with the CUDA back-end

We should expect a vector of 1, but we get:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

(and why do we have a one at the end? I suspect a flip somewhere)

If we change the binary operator to sycl::plus<>, the code works as expected and prints:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Results on openCL

For sycl::plus<>, the result is right, but for multiplies if won't compile and fails with:

Unsupported SPIR-V module
SPIRV module requires unsupported capability 63
Compilation failed
 -11 (CL_BUILD_PROGRAM_FAILURE)

If someone could try the reproducer on openCL it could be nice!

Environment (please complete the following information):

OS: rhel8.4
Target device and vendor: NVIDIA GPU

The text was updated successfully, but these errors were encountered:

Michoumichmich · 2021-08-13T21:45:34Z

Okay, it could be helpful, if I set the initial value to 3, I get, on the cuda back-end:

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3

How did it got here? 🤔

Michoumichmich · 2021-08-13T22:30:42Z

Solved on CUDA!

This resolves #4336 issue which is a bug related to 0 being used as the identity in the CUDA back-end. Signed-off-by: Michel Migdal [email protected]

Michoumichmich added the bug Something isn't working label Aug 13, 2021

Michoumichmich mentioned this issue Aug 13, 2021

[SYCL][libclc] Fix identity for multiplication #4337

Merged

bader added the cuda CUDA back-end label Aug 15, 2021

bader closed this as completed in #4337 Aug 16, 2021

bader pushed a commit that referenced this issue Aug 16, 2021

[SYCL][libclc] Fix identity for multiplication (#4337)

a6447ca

This resolves #4336 issue which is a bug related to 0 being used as the identity in the CUDA back-end. Signed-off-by: Michel Migdal [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint_inclusive_scan returns the wrong result #4336

Joint_inclusive_scan returns the wrong result #4336

Michoumichmich commented Aug 13, 2021 •

edited

Loading

Michoumichmich commented Aug 13, 2021 •

edited

Loading

Michoumichmich commented Aug 13, 2021

Joint_inclusive_scan returns the wrong result #4336

Joint_inclusive_scan returns the wrong result #4336

Comments

Michoumichmich commented Aug 13, 2021 • edited Loading

Results with the CUDA back-end

Results on openCL

Michoumichmich commented Aug 13, 2021 • edited Loading

Michoumichmich commented Aug 13, 2021

Michoumichmich commented Aug 13, 2021 •

edited

Loading

Michoumichmich commented Aug 13, 2021 •

edited

Loading