Skip to content

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jinz2014 opened this issue Apr 2, 2020 · 12 comments
Closed

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

jinz2014 opened this issue Apr 2, 2020 · 12 comments
Labels
cuda CUDA back-end

Comments

@jinz2014
Copy link
Contributor

jinz2014 commented Apr 2, 2020

https://github.com/jeffhammond/dpcpp-tutorial/blob/master/saxpy-usm.cc

Debugging the above program shows the following:

Thread 1 "a.out" received signal SIGFPE, Arithmetic exception.
0x00007ffff62eae0b in cuda_piextUSMSharedAlloc () from /home/cc/sycl_workspace/build/install/lib/libpi_cuda.so
(gdb) bt
#0 0x00007ffff62eae0b in cuda_piextUSMSharedAlloc () from /home/cc/sycl_workspace/build/install/lib/libpi_cuda.so
#1 0x00007ffff72002cb in cl::sycl::detail::usm::alignedAlloc(unsigned long, unsigned long, cl::sycl::context const&, cl::sycl::device const&, cl::sycl::usm::alloc) () from /home/cc/sycl_workspace/build/install/lib/libsycl.so
#2 0x00000000004027ee in main ()

@alexbatashev alexbatashev added the cuda CUDA back-end label Apr 3, 2020
@romanovvlad
Copy link
Contributor

Looks like the crash happens because alignment is 0 on the following line
https://github.com/intel/llvm/blob/sycl/sycl/plugins/cuda/pi_cuda.cpp#L3430 :
assert(reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0);

The OpenCL extension for USM (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc) says:

alignment is the minimum alignment in bytes for the requested host allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context. If alignment is 0, a default alignment will be used that is equal to the size of the largest data type supported by any OpenCL device in context.

I believe largest type is double16.

The SYCL extension for USM(https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) says:

size_t alignment - specifies the byte alignment. Must be a valid alignment supported by the implementation.

So, @fwyzard

  1. Cannot find info about alignment, for example, for cuMemAllocHost, do you know what alignment this function guarantees?
  2. Shouldn't this assert be a runtime check instead?
  3. Is it OK to change alignment to 1 if it is 0?

@jbrodman What should an implementation do if an alignment passed by user is not a valid alignment supported by the implementation?

@fwyzard
Copy link
Contributor

fwyzard commented Apr 23, 2020

hi @romanovvlad

So, @fwyzard

1. Cannot find info about alignment, for example, for [cuMemAllocHost](http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/online/group__CUMEM_g707085f1c7b0235429766a0cbd5b9cec.html#g707085f1c7b0235429766a0cbd5b9cec), do you know what alignment this function guarantees?

Neither can I.
The latest documentation doesn't say anything about alignment of the memory returned by cuMemHostAlloc()/cuMemAllocHost().
While about cuMemAlloc()/cuMemAllocManaged() it explicitly says

The allocated memory is suitably aligned for any kind of variable.

Let me try some empirical tests and/or asking NVIDIA about it.


2. Shouldn't this assert be a runtime check instead?

Do you mean it should raise an exception rather than aborting ?


3. Is it OK to change alignment to 1 if it is 0?

I would take alignment == 0 as requiring no specific alignment, and just ignore it.
That is, the check could become

 assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

@fwyzard
Copy link
Contributor

fwyzard commented Apr 23, 2020

From a quick test, it looks like the memory returned by cuMemAllocHost() is aligned to 512 (0x200) bytes.

@fwyzard
Copy link
Contributor

fwyzard commented Apr 23, 2020

Can you check if #1577 fixes the exception ?

@jbrodman
Copy link
Contributor

hi @romanovvlad

So, @fwyzard

1. Cannot find info about alignment, for example, for [cuMemAllocHost](http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/online/group__CUMEM_g707085f1c7b0235429766a0cbd5b9cec.html#g707085f1c7b0235429766a0cbd5b9cec), do you know what alignment this function guarantees?

Neither can I.
The latest documentation doesn't say anything about alignment of the memory returned by cuMemHostAlloc()/cuMemAllocHost().
While about cuMemAlloc()/cuMemAllocManaged() it explicitly says

The allocated memory is suitably aligned for any kind of variable.

Let me try some empirical tests and/or asking NVIDIA about it.

2. Shouldn't this assert be a runtime check instead?

Do you mean it should raise an exception rather than aborting ?

3. Is it OK to change alignment to 1 if it is 0?

I would take alignment == 0 as requiring no specific alignment, and just ignore it.
That is, the check could become

 assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

I agree with your interpretation of alignment == 0.

@jbrodman
Copy link
Contributor

It would probably be good if we could through runtime errors instead of crashing.

Hmm.. CUDA doesn't seem to have user aligned allocations.

@romanovvlad
Copy link
Contributor

@jbrodman Should we clarify what happens if required alignment is not supported by the implementation?

@romanovvlad
Copy link
Contributor

@jinz2014 Could you please check if #1577 solves the issue?

@fwyzard
Copy link
Contributor

fwyzard commented Apr 23, 2020

Hmm.. CUDA doesn't seem to have user aligned allocations.

Rather, it seems that all allocations are aligned to 512 bytes.

@khaled-rahman
Copy link

Hi, I am also facing this problem. I compiled and ran a DPCPP code successfully in CPU. When, I compiled it for GPU it worked fine. However, when I offload the code to GPU, it shows a floating point exception (cored dumped). I was wondering whether the problem is resolved for this case.

@jeffhammond
Copy link
Contributor

I too see this issue and wonder how I'm supposed to use USM on NVIDIA while this bug exists...

before alloc
---> piextUSMSharedAlloc(
       <unknown> : 0x7fffffffbae0
       <unknown> : 0xe9bcd0
       <unknown> : 0x7bd130
       <unknown> : 0
       <unknown> : 400000
       <unknown> : 0

Thread 1 "nstream-sycl-us" received signal SIGFPE, Arithmetic exception.
0x00007ffff65a46ab in cuda_piextUSMSharedAlloc () from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libpi_cuda.so
(cuda-gdb) bt
#0  0x00007ffff65a46ab in cuda_piextUSMSharedAlloc () from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libpi_cuda.so
#1  0x00007ffff7b48479 in cl::sycl::malloc_shared(unsigned long, cl::sycl::device const&, cl::sycl::context const&) ()
   from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libsycl.so.2
#2  0x0000000000405b2a in float* cl::sycl::malloc_shared<float>(unsigned long, cl::sycl::queue const&) ()
#3  0x0000000000403e03 in void run<float>(cl::sycl::queue&, int, unsigned long) ()
#4  0x000000000040301d in main ()

@bader
Copy link
Contributor

bader commented Oct 11, 2020

Can't reproduce with the tip of the branch. Most likely it's addressed by #2557.

@bader bader closed this as completed Oct 11, 2020
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this issue Feb 23, 2023
It looks like python on windows is always python.exe and on some systems
we have python3.exe alias created manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end
Projects
None yet
Development

No branches or pull requests

9 participants