Skip to content

[HIP] device globals amdgcn-link error #17193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sherholz-intel opened this issue Feb 26, 2025 · 2 comments
Open

[HIP] device globals amdgcn-link error #17193

sherholz-intel opened this issue Feb 26, 2025 · 2 comments
Labels
bug Something isn't working hip Issues related to execution on HIP backend.

Comments

@sherholz-intel
Copy link

Describe the bug

Trying to compile the Blender/Cycles SYCL backend with device globals support for AMD/HIP causes the build to fail with the following amdgcn-link error:

lld: /home/intel/Develop/sycl-demo/source/llvm-sycl/llvm/include/llvm/Support/Casting.h:578: decltype(auto) llvm::cast(From *) [To = llvm::Instruction, From = llvm::Value]: Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
PLEASE submit a bug report to https://github.com/intel/llvm/issues and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/intel/Develop/sycl-demo/build/llvm-sycl/install/bin/lld -flavor gnu -m elf64_amdgpu --no-undefined -shared -plugin-opt=-amdgpu-internalize-symbols -plugin-opt=mcpu=gfx1030 -plugin-opt=O3 --lto-CGO3 --whole-archive -o /tmp/kernel-gfx1030-920f48-592629.out /tmp/kernel-gfx1030-0b121a-fee210.o --no-whole-archive
1.	Running pass 'Function Pass Manager' on module 'ld-temp.o'.
2.	Running pass 'FPBuiltin Function Selection' on function '@_ZTSZ41oneapi_kernel_integrator_init_from_cameraP16KernelGlobalsGPUmmRN4sycl3_V17handlerEP14KernelWorkTileiPfiEUlNS2_7nd_itemILi1EEEE_'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  lld       0x00005e3c495481d8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 40
1  lld       0x00005e3c49545dae llvm::sys::RunSignalHandlers() + 238
2  lld       0x00005e3c49548b78
3  libc.so.6 0x000076bd8aa45330
4  libc.so.6 0x000076bd8aa9eb2c pthread_kill + 284
5  libc.so.6 0x000076bd8aa4527e gsignal + 30
6  libc.so.6 0x000076bd8aa288ff abort + 223
7  libc.so.6 0x000076bd8aa2881b
8  libc.so.6 0x000076bd8aa3b517
9  lld       0x00005e3c4b80fedb
10 lld       0x00005e3c4b80e382
11 lld       0x00005e3c4b80f7bc
12 lld       0x00005e3c4c61ed05 llvm::FPPassManager::runOnFunction(llvm::Function&) + 677
13 lld       0x00005e3c4c627132 llvm::FPPassManager::runOnModule(llvm::Module&) + 50
14 lld       0x00005e3c4c61f79c llvm::legacy::PassManagerImpl::run(llvm::Module&) + 1932
15 lld       0x00005e3c4a8a420c
16 lld       0x00005e3c4a8a308d llvm::lto::backend(llvm::lto::Config const&, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex&) + 461
17 lld       0x00005e3c4a88d626 llvm::lto::LTO::runRegularLTO(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>) + 1654
18 lld       0x00005e3c4a88cae0 llvm::lto::LTO::run(std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::FileCache) + 1168
19 lld       0x00005e3c4971c431 lld::elf::BitcodeCompiler::compile() + 1313
20 lld       0x00005e3c4969d67c void lld::elf::LinkerDriver::compileBitcodeFiles<llvm::object::ELFType<(llvm::endianness)1, true>>(bool) + 188
21 lld       0x00005e3c4967d89f void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::endianness)1, true>>(llvm::opt::InputArgList&) + 11023
22 lld       0x00005e3c49660ba6 lld::elf::LinkerDriver::linkerMain(llvm::ArrayRef<char const*>) + 15062
23 lld       0x00005e3c4965d050 lld::elf::link(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, bool, bool) + 800
24 lld       0x00005e3c49581956 lld::unsafeLldMain(llvm::ArrayRef<char const*>, llvm::raw_ostream&, llvm::raw_ostream&, llvm::ArrayRef<lld::DriverDef>, bool) + 2134
25 lld       0x00005e3c494d39fd lld_main(int, char**, llvm::ToolContext const&) + 301
26 lld       0x00005e3c494d3f47 main + 87
27 libc.so.6 0x000076bd8aa2a1ca
28 libc.so.6 0x000076bd8aa2a28b __libc_start_main + 139
29 lld       0x00005e3c494d3605 _start + 37
llvm-foreach: Aborted (core dumped)
clang++: error: amdgcn-link command failed with exit code 254 (use -v to see invocation)
clang version 20.0.0git (https://github.com/intel/llvm b512633e2de8547cecfce12aed2c62090282c6ed)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/intel/Develop/sycl-demo/build/llvm-sycl/install/bin
Build config: +assertions
clang++: note: diagnostic msg: Error generating preprocessed source(s).

To reproduce

  1. Include code snippet as short as possible
  2. Specify the command which should be used to compile the program
cmake -DWITH_CYCLES_DEBUG=OFF \
    -DWITH_CYCLES_DEVICE_CUDA=OFF \
    -DWITH_CYCLES_DEVICE_OPTIX=OFF \
    -DWITH_CYCLES_DEVICE_ONEAPI=ON \
    -DWITH_CYCLES_DEVICE_HIP=OFF \
    -DWITH_CYCLES_DEVICE_HIPRT=OFF \
    -DWITH_CYCLES_ONEAPI_BINARIES=ON \
    -DWITH_CYCLES_EMBREE=OFF \
	-DSYCL_OFFLINE_COMPILER_PARALLEL_JOBS=8 \
    -DSYCL_ROOT_DIR=${DPCPP_INSTALL_DIR} \
    -DCYCLES_ONEAPI_SYCL_TARGETS="amdgcn-amd-amdhsa" \
    -DCYCLES_ONEAPI_SYCL_OPTIONS_amdgcn-amd-amdhsa="--offload-arch=gfx1030" \
    -DWITH_X11_XINPUT=OFF \
    -DCMAKE_INSTALL_PREFIX=../install/blender-${BUILDTYPE} \
    ${PREFIX_PATH}/source/blender

make -j 20
make install -j

Environment

  • OS: Linux
  • Target device and vendor: AMD GPU
  • DPC++ version: [e.g. commit hash or output of clang++ --version]
  • Dependencies version: [e.g. the output of rocm-smi and sycl-ls --verbose]

SYCL-LS:

[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A770 Graphics 12.55.8 [1.6.32224+14]
[opencl:gpu][opencl:0] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [24.52.32224]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4070 8.9 [CUDA 12.4]
[hip:gpu][hip:0] AMD HIP BACKEND, AMD Radeon RX 6800 gfx1030 [HIP 60342.13]

Additional context

No response

@sherholz-intel sherholz-intel added bug Something isn't working hip Issues related to execution on HIP backend. labels Feb 26, 2025
@npmiller
Copy link
Contributor

Is this with static device globals? It may be related to: #15329

@sherholz-intel
Copy link
Author

sherholz-intel commented Feb 27, 2025

@npmiller It ii probably not the same error since the device globals are not static:

SYCL_EXTERNAL extern device_global<const KernelData> kernel_global_oneapi_data;
SYCL_EXTERNAL extern device_global<const IntegratorStateGPU> kernel_global_oneapi_integrator_state;

But it might be related to the solution of the mentioned problem, since with an older compiler version v6.0.0-rc1 (5f25e25675b9393831eb53afef483c06ad483f2b) the code builds but throws a similar error as reported above:

<HIP>[ERROR]: 
UR HIP ERROR:
	Value:           500
	Name:            hipErrorNotFound
	Description:     named symbol not found
	Function:        getGlobalVariablePointer
	Source Location: /home/intel/Develop/sycl-demo/build/llvm-v6.0.0-rc1/_deps/unified-runtime-src/source/adapters/hip/program.cpp:254

<HIP>[ERROR]: 
UR HIP ERROR:
	Value:           UR_RESULT_ERROR_UNKNOWN
	Function:        deviceGlobalCopyHelper
	Source Location: /home/intel/Develop/sycl-demo/build/llvm-v6.0.0-rc1/_deps/unified-runtime-src/source/adapters/hip/enqueue.cpp:1691

oneAPI test kernel execution: got a runtime exception "Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)"

But with the current compiler version (b512633e2de8547cecfce12aed2c62090282c6ed) it does not even compile/link anymore.

Maybe it is related to the solution for this #2475 , which probably are #15224 or #15148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hip Issues related to execution on HIP backend.
Projects
None yet
Development

No branches or pull requests

2 participants