[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

jinz2014 · 2020-04-02T21:41:24Z

https://github.com/jeffhammond/dpcpp-tutorial/blob/master/saxpy-usm.cc

Debugging the above program shows the following:

Thread 1 "a.out" received signal SIGFPE, Arithmetic exception.
0x00007ffff62eae0b in cuda_piextUSMSharedAlloc () from /home/cc/sycl_workspace/build/install/lib/libpi_cuda.so
(gdb) bt
#0 0x00007ffff62eae0b in cuda_piextUSMSharedAlloc () from /home/cc/sycl_workspace/build/install/lib/libpi_cuda.so
#1 0x00007ffff72002cb in cl::sycl::detail::usm::alignedAlloc(unsigned long, unsigned long, cl::sycl::context const&, cl::sycl::device const&, cl::sycl::usm::alloc) () from /home/cc/sycl_workspace/build/install/lib/libsycl.so
#2 0x00000000004027ee in main ()

romanovvlad · 2020-04-22T16:08:58Z

Looks like the crash happens because alignment is 0 on the following line
https://github.com/intel/llvm/blob/sycl/sycl/plugins/cuda/pi_cuda.cpp#L3430 :
assert(reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0);

The OpenCL extension for USM (https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc) says:

alignment is the minimum alignment in bytes for the requested host allocation. It must be a power of two and must be equal to or smaller than the size of the largest data type supported by any OpenCL device in context. If alignment is 0, a default alignment will be used that is equal to the size of the largest data type supported by any OpenCL device in context.

I believe largest type is double16.

The SYCL extension for USM(https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) says:

size_t alignment - specifies the byte alignment. Must be a valid alignment supported by the implementation.

So, @fwyzard

Cannot find info about alignment, for example, for cuMemAllocHost, do you know what alignment this function guarantees?
Shouldn't this assert be a runtime check instead?
Is it OK to change alignment to 1 if it is 0?

@jbrodman What should an implementation do if an alignment passed by user is not a valid alignment supported by the implementation?

fwyzard · 2020-04-23T09:14:02Z

hi @romanovvlad

So, @fwyzard

1. Cannot find info about alignment, for example, for [cuMemAllocHost](http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/online/group__CUMEM_g707085f1c7b0235429766a0cbd5b9cec.html#g707085f1c7b0235429766a0cbd5b9cec), do you know what alignment this function guarantees?

Neither can I.
The latest documentation doesn't say anything about alignment of the memory returned by cuMemHostAlloc()/cuMemAllocHost().
While about cuMemAlloc()/cuMemAllocManaged() it explicitly says

The allocated memory is suitably aligned for any kind of variable.

Let me try some empirical tests and/or asking NVIDIA about it.

2. Shouldn't this assert be a runtime check instead?

Do you mean it should raise an exception rather than aborting ?

3. Is it OK to change alignment to 1 if it is 0?

I would take alignment == 0 as requiring no specific alignment, and just ignore it.
That is, the check could become

 assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

fwyzard · 2020-04-23T09:32:52Z

From a quick test, it looks like the memory returned by cuMemAllocHost() is aligned to 512 (0x200) bytes.

fwyzard · 2020-04-23T13:47:56Z

Can you check if #1577 fixes the exception ?

jbrodman · 2020-04-23T14:00:18Z

hi @romanovvlad
So, @fwyzard
1. Cannot find info about alignment, for example, for [cuMemAllocHost](http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/online/group__CUMEM_g707085f1c7b0235429766a0cbd5b9cec.html#g707085f1c7b0235429766a0cbd5b9cec), do you know what alignment this function guarantees?
Neither can I.
The latest documentation doesn't say anything about alignment of the memory returned by cuMemHostAlloc()/cuMemAllocHost().
While about cuMemAlloc()/cuMemAllocManaged() it explicitly says

The allocated memory is suitably aligned for any kind of variable.

Let me try some empirical tests and/or asking NVIDIA about it.
2. Shouldn't this assert be a runtime check instead?
Do you mean it should raise an exception rather than aborting ?
3. Is it OK to change alignment to 1 if it is 0?
I would take alignment == 0 as requiring no specific alignment, and just ignore it.
That is, the check could become
 assert((alignment == 0) or (reinterpret_cast<std::uintptr_t>(*result_ptr) % alignment == 0));

I agree with your interpretation of alignment == 0.

jbrodman · 2020-04-23T14:02:45Z

It would probably be good if we could through runtime errors instead of crashing.

Hmm.. CUDA doesn't seem to have user aligned allocations.

romanovvlad · 2020-04-23T14:10:45Z

@jbrodman Should we clarify what happens if required alignment is not supported by the implementation?

romanovvlad · 2020-04-23T14:13:45Z

@jinz2014 Could you please check if #1577 solves the issue?

fwyzard · 2020-04-23T14:50:34Z

Hmm.. CUDA doesn't seem to have user aligned allocations.

Rather, it seems that all allocations are aligned to 512 bytes.

khaled-rahman · 2020-05-23T19:53:15Z

Hi, I am also facing this problem. I compiled and ran a DPCPP code successfully in CPU. When, I compiled it for GPU it worked fine. However, when I offload the code to GPU, it shows a floating point exception (cored dumped). I was wondering whether the problem is resolved for this case.

jeffhammond · 2020-07-31T23:00:17Z

I too see this issue and wonder how I'm supposed to use USM on NVIDIA while this bug exists...

before alloc
---> piextUSMSharedAlloc(
       <unknown> : 0x7fffffffbae0
       <unknown> : 0xe9bcd0
       <unknown> : 0x7bd130
       <unknown> : 0
       <unknown> : 400000
       <unknown> : 0

Thread 1 "nstream-sycl-us" received signal SIGFPE, Arithmetic exception.
0x00007ffff65a46ab in cuda_piextUSMSharedAlloc () from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libpi_cuda.so
(cuda-gdb) bt
#0  0x00007ffff65a46ab in cuda_piextUSMSharedAlloc () from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libpi_cuda.so
#1  0x00007ffff7b48479 in cl::sycl::malloc_shared(unsigned long, cl::sycl::device const&, cl::sycl::context const&) ()
   from /nfs/pdx/home/jrhammon/ISYCL/llvm/build/install/lib/libsycl.so.2
#2  0x0000000000405b2a in float* cl::sycl::malloc_shared<float>(unsigned long, cl::sycl::queue const&) ()
#3  0x0000000000403e03 in void run<float>(cl::sycl::queue&, int, unsigned long) ()
#4  0x000000000040301d in main ()

bader · 2020-10-11T14:53:21Z

Can't reproduce with the tip of the branch. Most likely it's addressed by #2557.

It looks like python on windows is always python.exe and on some systems we have python3.exe alias created manually.

alexbatashev assigned Ruyk Apr 3, 2020

alexbatashev added the cuda CUDA back-end label Apr 3, 2020

fwyzard mentioned this issue Apr 23, 2020

[SYCL][CUDA][USM] Improve CUDA USM memory allocation functions #1577

Closed

christgau mentioned this issue Jun 9, 2020

Compilation failed in cuda_piMemBufferPartition: type qualifiers ignored on cast result type #1837

Closed

jeffhammond unassigned Ruyk Jul 31, 2020

bader closed this as completed Oct 11, 2020

nmnobre mentioned this issue Feb 28, 2022

[SYCL][CUDA] Alignment issue with zero-sized local arguments #5682

Closed

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this issue Feb 23, 2023

[CI] Fix python executable for windows run (intel#1467)

f7ff8d3

It looks like python on windows is always python.exe and on some systems we have python3.exe alias created manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

jinz2014 commented Apr 2, 2020

romanovvlad commented Apr 22, 2020

fwyzard commented Apr 23, 2020

fwyzard commented Apr 23, 2020 •

edited

Loading

fwyzard commented Apr 23, 2020

jbrodman commented Apr 23, 2020

jbrodman commented Apr 23, 2020

romanovvlad commented Apr 23, 2020

romanovvlad commented Apr 23, 2020

fwyzard commented Apr 23, 2020

khaled-rahman commented May 23, 2020

jeffhammond commented Jul 31, 2020

bader commented Oct 11, 2020

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

[CUDA] floating-point exception in cuda_piextUSMSharedAlloc #1467

Comments

jinz2014 commented Apr 2, 2020

romanovvlad commented Apr 22, 2020

fwyzard commented Apr 23, 2020

fwyzard commented Apr 23, 2020 • edited Loading

fwyzard commented Apr 23, 2020

jbrodman commented Apr 23, 2020

jbrodman commented Apr 23, 2020

romanovvlad commented Apr 23, 2020

romanovvlad commented Apr 23, 2020

fwyzard commented Apr 23, 2020

khaled-rahman commented May 23, 2020

jeffhammond commented Jul 31, 2020

bader commented Oct 11, 2020

fwyzard commented Apr 23, 2020 •

edited

Loading