Skip to content

[SYCL] Improve the error mechanism of llvm-no-spir-kernel #1068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 19, 2020

Conversation

AGindinson
Copy link
Contributor

@AGindinson AGindinson commented Jan 29, 2020

This patch improves the tool's diagnostic upon finding a
SPIR kernel within an LLVM module. Despite that the tool's
only current use is within the SYCL FPGA flow, it's important
to make the message target-agnostic, so that the tool is not
tied to a particular device BE.
A related commit to the Clang driver has extended these diagnostics
with SYCL FPGA specifics without affecting the tool itself.

This patch also introduces testing for the return code value. For
example, this should allow the Clang driver users/developers to
differentiate between the two possible causes of llvm-no-spir-kernel
failure.

Signed-off-by: Artem Gindinson [email protected]

Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have objections against the patch. But I do not see purpose of these changes (as well as the purpose of the tool). The commit message is confusing:

This patch improves the tool's diagnostic upon finding a
SPIR kernel within an LLVM module. Despite that the tool's
only current use is within the SYCL FPGA flow, it's important
to make the message target-agnostic - this way, the tool retains
the potential to be re-used for other targets with an LLVM-based
backend.

What exactly do you mean here? If other targets is other programming models I do not see how it can be reused for other targets, it checks SPIR_KERNEL calling convention, AFAIK it appears only in spir-based IR targets which is used only by OpenCL/SYCL. But if under other targets you mean non-FPGA targets SYCL, I don't see why it cannot be used for them now.

@AGindinson
Copy link
Contributor Author

AGindinson commented Jan 29, 2020

But I do not see purpose of these changes (as well as the purpose of the tool).

This tool is called by the Clang driver in the FPGA flow. Before an object file is linked against an archive that contains AOT-compiled device code, the tool can detects the presence of a SPIRKernel-calling convention function in the object file. This would mean that the device code from the object file was not AOT-compiled, neither stored in the archive. Terminating the compilation in such case and requesting the archive re-compilation to include all device code is better than letting the user run into a runtime error due to an "unknown kernel".

The purpose of these changes is to provide a helpful diagnostic message to the user - from within the tool, we can reach the name of the missing function, and include it in the diagnostic. The other "half" of the diagnostic will be provided by the Clang driver, giving a more detailed explanation with FPGA specifics. I will be uploading the driver patch shortly.

But if under other targets you mean non-FPGA targets SYCL,

Yes, non-FPGA SYCL targets are implied.

I don't see why it cannot be used for them now.

It can. The point was "let's not affect the tool's re-usability for non-FPGA targets by adding FPGA-specific diagnostics".

I'll reformulate the commit message into something less confusing.

@AGindinson AGindinson force-pushed the private/agindins/llvm-no-spir-update branch from e7db7ce to fce0c2c Compare January 29, 2020 11:41
@AGindinson
Copy link
Contributor Author

Updated the commit message & the PR description.

@Fznamznon
Copy link
Contributor

Fznamznon commented Jan 29, 2020

This tool is called by the Clang driver in the FPGA flow. Before an object file is linked against an archive that contains AOT-compiled device code, the tool can detects the presence of a SPIRKernel-calling convention function in the object file. This would mean that the device code from the object file was not AOT-compiled, neither stored in the archive. Terminating the compilation in such case and requesting the archive re-compilation to include all device code is better than letting the user run into a runtime error due to an "unknown kernel".

I'd say that I do not see why we do that. This relies on presence of only one module with device code. But we could support cases when there are multiple device modules.

The purpose of these changes is to provide a helpful diagnostic message to the user - from within the tool, we can reach the name of the missing function, and include it in the diagnostic. The other "half" of the diagnostic will be provided by the Clang driver, giving a more detailed explanation with FPGA specifics. I will be uploading the driver patch shortly.

Are you sure that Unexpected SPIR kernel occurence: typeinfo name for run_kernel_add(cl::sycl::queue&, float, float, float&)::$_0::operator()(cl::sycl::h andler&) const::SimpleAdd is some sort of useful diagnostic?

I'll reformulate the commit message into something less confusing.

It feels a bit better now.

@AGindinson
Copy link
Contributor Author

Are you sure that Unexpected SPIR kernel occurence: typeinfo name for run_kernel_add(cl::sycl::queue&, float, float, float&)::$_0::operator()(cl::sycl::h andler&) const::SimpleAdd is some sort of useful diagnostic?

Combined with the intended Clang driver changes, it should make sense. I've decided to split the changes into the two patches as per the "one commit - one component" policy. I'll provide the link to the driver PR ASAP.

@AGindinson
Copy link
Contributor Author

AGindinson commented Jan 29, 2020

This tool is called by the Clang driver in the FPGA flow. Before an object file is linked against an archive that contains AOT-compiled device code, the tool can detects the presence of a SPIRKernel-calling convention function in the object file. This would mean that the device code from the object file was not AOT-compiled, neither stored in the archive. Terminating the compilation in such case and requesting the archive re-compilation to include all device code is better than letting the user run into a runtime error due to an "unknown kernel".

I'd say that I do not see why we do that. This relies on presence of only one module with device code. But we could support cases when there are multiple device modules.

In the SYCL FPGA flow, a separate llvm-no-spir-kernel call is made for each device module-containing object.

@Fznamznon
Copy link
Contributor

This tool is called by the Clang driver in the FPGA flow. Before an object file is linked against an archive that contains AOT-compiled device code, the tool can detects the presence of a SPIRKernel-calling convention function in the object file. This would mean that the device code from the object file was not AOT-compiled, neither stored in the archive. Terminating the compilation in such case and requesting the archive re-compilation to include all device code is better than letting the user run into a runtime error due to an "unknown kernel".

I'd say that I do not see why we do that. This relies on presence of only one module with device code. But we could support cases when there are multiple device modules.

In the SYCL FPGA flow, a separate llvm-no-spir-kernel call is made for each device module-containing object.

I think there is some misunderstanding. Why do you think user can see a runtime error due to an "unknown kernel", if not all modules are stored in archive?

@AGindinson
Copy link
Contributor Author

In the SYCL FPGA flow, a separate llvm-no-spir-kernel call is made for each device module-containing object.

I think there is some misunderstanding. Why do you think user can see a runtime error due to an "unknown kernel", if not all modules are stored in archive?

My earlier comment:

This would mean that the device code from the object file was not AOT-compiled, neither stored in the archive.

Since JIT is not supported for the targeted FPGA devices, it would not be possible to JIT a kernel from an outside object file. Any request to execute such a kernel from the in-archive host code would lead to the "unknown kernel" error.

@AGindinson
Copy link
Contributor Author

I'll provide the link to the driver PR ASAP.

#1077

domiyan
domiyan previously approved these changes Jan 30, 2020
@AGindinson AGindinson force-pushed the private/agindins/llvm-no-spir-update branch 4 times, most recently from ac83a32 to 1308471 Compare February 3, 2020 11:50
@AGindinson AGindinson force-pushed the private/agindins/llvm-no-spir-update branch from 1308471 to 3f96bbf Compare February 3, 2020 14:20
This patch improves the tool's diagnostic upon finding a
SPIR kernel within an LLVM module. Despite that the tool's
only current use is within the SYCL FPGA flow, it's important
to make the message target-agnostic, so that the tool is not
tied to a particular device BE.
A related commit to the Clang driver has extended these diagnostics
with SYCL FPGA specifics without affecting the tool itself.

This patch also introduces testing for the return code value. For
example, this should allow the Clang driver users/developers to
differentiate between the two possible causes of llvm-no-spir-kernel
failure.

Signed-off-by: Artem Gindinson <[email protected]>
@AGindinson AGindinson force-pushed the private/agindins/llvm-no-spir-update branch 2 times, most recently from 8261f4e to 21b1e16 Compare February 13, 2020 11:13
@AGindinson
Copy link
Contributor Author

Squashed & updated the main commit message.

Signed-off-by: Artem Gindinson <[email protected]>
Copy link
Contributor

@mlychkov mlychkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mdtoguchi mdtoguchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AGindinson
Copy link
Contributor Author

Note: the Linux failure is an unrelated sycl/test/assert.cpp one.

@bader
Copy link
Contributor

bader commented Feb 18, 2020

Note: the Linux failure is an unrelated sycl/test/assert.cpp one.

@asavonic, IIRC, you were going to fix this. Any updates?

@bader bader merged commit 2295308 into intel:sycl Feb 19, 2020
@AGindinson AGindinson deleted the private/agindins/llvm-no-spir-update branch February 19, 2020 21:46
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Feb 20, 2020
…_int_headers

* origin/sycl:
  [SYCL] Improve the error mechanism of llvm-no-spir-kernel (intel#1068)
AGindinson pushed a commit to AGindinson/llvm that referenced this pull request Feb 21, 2020
After intel#1068 has included the Demangle header, this fix to CMakeLists
should guarantee successful builds in all configurations

Signed-off-by: Artem Gindinson <[email protected]>
bader pushed a commit that referenced this pull request Feb 22, 2020
After #1068 has included the Demangle header, this fix to CMakeLists
should guarantee successful builds in all configurations

Signed-off-by: Artem Gindinson <[email protected]>
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Feb 25, 2020
…ages_docs

* origin/sycl: (1092 commits)
  [CI] Add clang-format checker to pre-commit checks (intel#1163)
  [SYCL][CUDA] Initial CUDA backend support (intel#1091)
  [USM] Align OpenCL USM extension header with the specification (intel#1162)
  [SYCL][NFC] Fix unreferenced variable warning (intel#1158)
  [SYCL] Fix __spirv_GroupBroadcast overloads (intel#1152)
  [SYCL] Add llvm/Demangle link dependency for llvm-no-spir-kernel (intel#1156)
  [SYCL] LowerWGScope pass should not be skipped when -O0 is used
  [SYCL][Doc][USM] Add refactored pointer and device queries to USM spec (intel#1118)
  [SYCL] Update the kernel parameter rule to is-trivially-copy-construc… (intel#1144)
  [SYCL] Move internal headers to source dir (intel#1136)
  [SYCL] Forbid declaration of non-const static variables inside kernels (intel#1141)
  [SYCL][NFC] Remove idle space (intel#1148)
  [SYCL] Improve the error mechanism of llvm-no-spir-kernel (intel#1068)
  [SYCL] Added CTS test config (intel#1063)
  [SYCL] Implement check-sycl-deploy target (intel#1142)
  [SYCL] Preserve original message and code of kernel/program build result (intel#1108)
  [SYCL] Fix LIT after LLVM change in community
  Translate LLVM's cmpxchg instruction to SPIR-V
  Add volatile qualifier for atom_ builtins
  Fix -Wunused-variable warnings
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants