Skip to content

[SYCL] Integer buffer index in SYCL kernel causes linker crash #2353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krasznaa opened this issue Aug 21, 2020 · 12 comments
Closed

[SYCL] Integer buffer index in SYCL kernel causes linker crash #2353

krasznaa opened this issue Aug 21, 2020 · 12 comments
Labels
bug Something isn't working SPIR-V Issues related to SPIRV-LLVM-Translator

Comments

@krasznaa
Copy link
Contributor

Dear All,

We bumped into a quite interesting issue with @czangela...

Take the following example function:

// SYCL include(s).
#include <CL/sycl.hpp>

// System include(s).
#include <vector>

void sycl_int_idx( cl::sycl::queue& queue ) {

   // Convenience declaration.
   using am = cl::sycl::access::mode;

   // The range to run the kernel on.
   std::size_t nMiddleSPs = 200, nBottomSPs = 300;
   cl::sycl::range< 2 > mbRange( nMiddleSPs, nBottomSPs );

   // Buffers storing the results of the kernel.
   std::vector< int > nMiddleBottomPairsVector( nMiddleSPs, 0 );
   cl::sycl::buffer< int, 1 >
      nMiddleBottomPairsBuffer( nMiddleBottomPairsVector.data(),
                                nMiddleSPs );
   cl::sycl::buffer< std::size_t, 1 >
      middleBottomIndices( nMiddleSPs * nBottomSPs );

   // Submit the problematic job.
   queue.submit( [&]( cl::sycl::handler& h ) {

         // Accessors to the buffers.
         auto countAcc = nMiddleBottomPairsBuffer.get_access< am::atomic >( h );
         auto indexAcc = middleBottomIndices.get_access< am::write >( h );

         // Launch the problematic parallel kernel.
         h.parallel_for< class dublet_search >( mbRange, [=]( cl::sycl::id< 2 > idx ) {

               // Access the indices as int-s.
               const int middleIndex = idx[ 0 ];
               const int bottomIndex = idx[ 1 ];
               if( ( middleIndex >= nMiddleSPs ) ||
                   ( bottomIndex >= nBottomSPs ) ) {
                  return;
               }

               // Do some actual work on actual objects, to decide if they
               // for a valid dublet or not...

               // Fill the output buffer(s).
               const int outputIndex = countAcc[ middleIndex ].fetch_add( 1 );
               indexAcc[ middleIndex * nBottomSPs + outputIndex ] = bottomIndex;
            } );
      } );

   return;
}

(It comes from a piece of code that tries to pair up "some type" of objects, and store in output buffers the indices of the matching pairs.)

If I try to compile this into a shared library, with debug symbols on the shared library, I get:

[bash][Elrond]:sycl_int_idx > make
clang++ -fPIC -fsycl -g -o sycl_int_idx.o -c sycl_int_idx.cxx
clang++ -fPIC -fsycl -g -o libSyclIntIdx.so -shared sycl_int_idx.o
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv -o /tmp/sycl_int_idx-269cb5-71069c.spv -spirv-max-version=1.1 -spirv-ext=+all,-SPV_INTEL_usm_storage_classes /tmp/sycl_int_idx-84cabc.bc 
1.      Running pass 'LLVMToSPIRV' on module '/tmp/sycl_int_idx-84cabc.bc'.
 #0 0x000055c96322d8ba llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x5888ba)
 #1 0x000055c96322b784 llvm::sys::RunSignalHandlers() (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x586784)
 #2 0x000055c96322b8c8 SignalHandler(int) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x5868c8)
 #3 0x00007f609feb48a0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x128a0)
 #4 0x000055c962ef4d7c SPIRV::LLVMToSPIRVDbgTran::transDbgExpression(llvm::DIExpression const*) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x24fd7c)
 #5 0x000055c962ef569e SPIRV::LLVMToSPIRVDbgTran::transDbgEntryImpl(llvm::MDNode const*) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x25069e)
 #6 0x000055c962ef59bd SPIRV::LLVMToSPIRVDbgTran::transDbgEntry(llvm::MDNode const*) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x2509bd)
 #7 0x000055c962efb2d7 SPIRV::LLVMToSPIRVDbgTran::finalizeDebugValue(llvm::DbgVariableIntrinsic const*) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x2562d7)
 #8 0x000055c962efb640 SPIRV::LLVMToSPIRVDbgTran::transDebugMetadata() (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x256640)
 #9 0x000055c962e69f7a SPIRV::LLVMToSPIRV::translate() (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x1c4f7a)
#10 0x000055c962e6a0cc SPIRV::LLVMToSPIRV::runOnModule(llvm::Module&) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x1c50cc)
#11 0x000055c9631794b1 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x4d44b1)
#12 0x000055c962e6a612 llvm::writeSpirv(llvm::Module*, SPIRV::TranslatorOpts const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x1c5612)
#13 0x000055c962db4874 main (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x10f874)
#14 0x00007f609ed4cb97 __libc_start_main /build/glibc-2ORdQG/glibc-2.27/csu/../csu/libc-start.c:344:0
#15 0x000055c962dc3daa _start (/data/software/intel/clang/12.0.0-2020-08-16/x86_64-ubuntu1804-gcc8-opt/bin/llvm-spirv+0x11edaa)
llvm-foreach: Segmentation fault (core dumped)
clang-12: error: llvm-spirv command failed with exit code 1 (use -v to see invocation)
Makefile:6: recipe for target 'libSyclIntIdx.so' failed
make: *** [libSyclIntIdx.so] Error 1
[bash][Elrond]:sycl_int_idx >

Note that I used the following Makefile for this test:

# Flags to use for the build.
CXXFLAGS=-fPIC -fsycl -g

libSyclIntIdx.so: sycl_int_idx.o
	clang++ ${CXXFLAGS} -o $@ -shared $<

clean:
	rm -f *.o
	rm -f libSyclIntIdx.so

distclean: clean
	rm -f *~

.SUFFIXES: .cxx .o

.cxx.o:
	clang++ ${CXXFLAGS} -o $@ -c $^

After a fair bit of debugging I found that if I modify the type of middleIndex and bottomIndex in this example from int to std::size_t, the linking error disappears. (And the original code starts to work.)

At first I thought that the compiler was not creating cl::sycl::id<1> objects out of the int indices correctly. But even if I create such objects explicitly from the int indices, the linking still dies in the same way.

So at this point I'd let you guys debug the problem. It is of course okay if we need to use std::size_t indices in the kernels explicitly. But it would be helpful if the compiler gave a clear message about it.

Also take note that this only happens when building the code in "debug mode". Without the -g flag the compilation/linking succeeds. 😕 And it also succeeds when building CUDA binaries. Only the compilation for -fsycl-targets=spir64-unknown-unknown-sycldevice breaks like this. (As far as I can tell.)

Pinging @ivorobts, @fwyzard, @leggett, @vpascuzz.

Cheers,
Attila

@AlexeySachkov AlexeySachkov added the bug Something isn't working label Aug 21, 2020
@AlexeySachkov
Copy link
Contributor

Hi @krasznaa, thanks for the report,

This might be related to recent activity of enabling some early optimizations (i.e. prior generating SPIR-V). Could you please try to add -fno-sycl-early-optimizations compiler flag and see if it helps?

@krasznaa
Copy link
Contributor Author

Unfortunately that is probably not it.

[bash][Elrond]:sycl_int_idx > make
clang++ -fPIC -fsycl -fno-sycl-early-optimizations -g -o sycl_int_idx.o -c sycl_int_idx.cxx
clang-12: error: unknown argument: '-fno-sycl-early-optimizations'
Makefile:18: recipe for target 'sycl_int_idx.o' failed
make: *** [sycl_int_idx.o] Error 1
[bash][Elrond]:sycl_int_idx >

It seems that my compiler version (built last week as you can see from the stack trace in the original post 😉) doesn't even know about this flag yet...

@AlexeySachkov
Copy link
Contributor

It seems that my compiler version (built last week as you can see from the stack trace in the original post 😉) doesn't even know about this flag yet...

Ah, for one week old compiler you should probably try -fno-sycl-std-optimizations (#2316 performed a renaming exactly a week ago)

@krasznaa
Copy link
Contributor Author

Yepp, your hunch was correct after all...

[bash][Elrond]:sycl_int_idx > make
clang++ -fPIC -fsycl -fno-sycl-std-optimizations -g -o sycl_int_idx.o -c sycl_int_idx.cxx
clang++ -fPIC -fsycl -fno-sycl-std-optimizations -g -o libSyclIntIdx.so -shared sycl_int_idx.o
[bash][Elrond]:sycl_int_idx >

The linking succeeds if I disable those optimisations.

@AlexeySachkov
Copy link
Contributor

The linking succeeds if I disable those optimisations.

Please consider this flag as current workaround for the issue.

The main problem is that optimized code generates some new debug metadata which isn't expected by the translator.
There are even a few PRs to fix that (KhronosGroup/SPIRV-LLVM-Translator#679, KhronosGroup/SPIRV-LLVM-Translator#630), but all of them are not conformant with the debug info specification in SPIR-V.

Tagging @AlexeySotkin and @asavonic here for awareness.

I guess that the proper plan would be:

  • collect all new required functionality and release debug info spec update: it will take a while as it require work with Khronos to release formal specification update
  • in the meantime, we might want to adjust the translator to just skip unknown debug metadata: debugging experience might be affected somehow, but at least compilation won't fail anymore

@vrpascuzzi
Copy link

Could you please try to add -fno-sycl-early-optimizations compiler flag and see if it helps?

FYI: I needed this also for a7ad8b8.

@bader
Copy link
Contributor

bader commented Oct 5, 2020

Could you please try to add -fno-sycl-early-optimizations compiler flag and see if it helps?

FYI: I needed this also for a7ad8b8.

@vpascuzz, do you mean that the tests added by a7ad8b8 can be enabled if we build them with -fno-sycl-early-optimizations flag?

@vrpascuzzi
Copy link

@bader I didn't try the tests myself, I was only stating that I needed the above compiler flag to get around the llvm-spirv crash when compiling my own code with a7ad8b8.

@bader
Copy link
Contributor

bader commented Oct 5, 2020

Ah... by "with a7ad8b8", you mean "with the compiler built from sources at this revision".

Is llvm-spirv crash you see also caused by debug metadata translation or some other problem?

@AlexeySachkov AlexeySachkov added the SPIR-V Issues related to SPIRV-LLVM-Translator label Oct 13, 2020
@AlexeySachkov
Copy link
Contributor

@krasznaa, @vrpascuzzi, we have enhanced the translator with support of new debug info and this issue should disappear

@AlexeySachkov
Copy link
Contributor

@krasznaa, @vrpascuzzi, I will close this, because there were no further responses from you for about a year, but if you still experience this or another problem, please let us know

@krasznaa
Copy link
Contributor Author

Sorry for the silence.

I have not seen this type of error ever since. So closing the issue was correct.

martygrant added a commit to martygrant/llvm that referenced this issue Dec 4, 2024
martygrant added a commit to martygrant/llvm that referenced this issue Dec 4, 2024
Chenyang-L pushed a commit that referenced this issue Feb 18, 2025
[L0] Enable zesInit by default given newer L0 IP version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SPIR-V Issues related to SPIRV-LLVM-Translator
Projects
None yet
Development

No branches or pull requests

4 participants