-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL] Move devicelib assert wrapper to header files #1398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Move devicelib assert wrapper to header files #1398
Conversation
sycl/include/CL/sycl/builtins.hpp
Outdated
unsigned line); | ||
} | ||
#endif | ||
#include "devicelib/devicelib-assert.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach has a limitation: since these wrappers are defined in the header file, it has to be #included in the translation unit that uses standard C and C++ functions. CL/sycl.hpp
includes the necessary header, so most of the use-cases should work fine.
If CL/sycl.hpp
is not included and standard functions are used in device code (e.g. in functions defined as SYCL_EXTERNAL), user have to manually add #include
directive or -include
compiler option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the necessary parts to the compiler headers?
Something similar to https://github.com/intel/llvm/blob/sycl/clang/lib/Headers/__clang_cuda_libdevice_declares.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but why do we need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To lift the limitation you described in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of moving this to "compiler headers" is that the wrappers will be automatically engaged, when a standard header (e.g. assert.h) is included.
So the new assert.h would provide the implementation for __assert_fail, and then #include_next the real assert.h. We would need a new sub-directory in the compiler headers directory, e.g. device_wrappers, where we will put the new header files.
This will be transparent for programming models that do not require sycl.hpp, but also rely on the devicelib implementation.
At the same time, I am concerned about this change. Let's say we do the same for FP64 math. All wrappers will declare some functions with 'double' arguments, and the functions will remain in the final image at O0. As I understand, we want to make FP64 usage conditional. How do we do that with the header files implementation approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Slava.
libstdc++, libc++ and MSVC complex implementation are different:
libstdc++: based on c99 complex API, so we need to provide all c99 complex math functions in libdevice lib
libc++: all implementation are included in header file but depends on libm function, so we need to support cmath functions in devicelib to enable.
MSVC: all implementation are included in header file but depends on libm function and some MSVC internal math functions.
For functions included in SPV, we expect backend may provide some optimized version to override and we don't think backend would like to override those MSVC internal functions as they are only used by Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation, Ge! I will need to comprehend it tomorrow )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried inline and it turned out to work for cmath functions as those functions are very simple wrapper. But I doubt whether it can work on some complicated implementation and we have a case currently
AFAIR, inline
does not force inlining, and a linked device program will have a single definition of an inline function. So this should be identical to having this function in an object file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried inline and it turned out to work for cmath functions as those functions are very simple wrapper. But I doubt whether it can work on some complicated implementation and we have a case currently
AFAIR,
inline
does not force inlining, and a linked device program will have a single definition of an inline function. So this should be identical to having this function in an object file.
Hi, Andrew.
I think it doesn't matter whether compiler decide to inline the functions or not. We split the device libraries into float and double to support devices without fp64 support. If we move all the implementation into header files, users will include both float and double math functions in their source code even they don't use double in device code and this may introduce unnecessary "double" code in their spirv files which will lead to failures in devices without fp64 support.
If we added "inline" to decorate the functions, I find the functions not invoked in device code will not exist in user's spirv file which seems to be able to solve the problem. But as we know, compiler may deny the "inline" if the functions are too complicated, I am not sure whether adding "inline" decoration can solve the problems under such circumstances, I will try it later.
Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we move all the implementation into header files, users will include both float and double math functions in their source code even they don't use double in device code and this may introduce unnecessary "double" code in their spirv files which will lead to failures in devices without fp64 support.
If we added "inline" to decorate the functions, I find the functions not invoked in device code will not exist in user's spirv file which seems to be able to solve the problem.
So basically we have the following:
// used in a kernel
inline float cosf(float f) {
...
}
// not used
inline double cos(double d) {
...
}
void kernel() {
cosf(...);
}
From my understanding, if a function is not called from a kernel, it is not emitted to device LLVM IR (regardless of whether it is inline or not).
One drawback of moving the "fallback" implementation into the header file is that it may require -fdeclare-spirv-builtins (#1384) for compilation of the user program. This does not affect this particular change, but may be a problem for following the same idea with the rest of the debicelib fallback implementations. |
-fdeclare-spirv-builtins seems to be enabled by default. Why it is a drawback? |
I think it's enabled by default only in the SYCL mode. |
Hi, @asavonic |
Right. Our
What if we provide our
Basically, the idea is to provide function definitions whenever user includes a header that contains the corresponding declarations. If math declarations come from |
|
That is a bit different from what I tried to suggest:
|
Oh, I see. I misunderstood your suggestion. Based the code above, I think it should work. All undefined math function reference should be resolved as long as we let compiler sees the corresponding implementation in device compilation phase. I will try the cmath and complex tomorrow. |
Hi, @asavonic .
Thank you very much. |
In general, we cannot request users to add header file paths explicitly. This breaks the current best known practices, which is to let compiler driver manage this. I still do not like the whole idea of moving implementations to the header files:1
This code just works on x86 Linux right now. We are breaking this with the headers implementation, since the user will not be able to redefine __assert_fail. It will still work BC "libraries", since the __assert_fail will be a weak definition there. If we are concerned about multiple different implementations of assert() in different "libc" libraries, we'd better come up with a way for linking different BC libraries in the driver. So far we only have Linux and MS BC libraries for assert(), so it is not a big problem to link them in clang driver by default (or with some device-only options). |
libdevice/include/assert.h
Outdated
@@ -0,0 +1,76 @@ | |||
//==--------- devicelib-assert.h - wrapper definition for C assert ---------==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is move this header to the compiler headers.
Here the same wrapper implementation for PTX target: https://github.com/intel/llvm/blob/sycl/clang/lib/Headers/__clang_cuda_runtime_wrapper.h#L335.
libdevice/include/assert.h
Outdated
#ifndef __DEVICELIB_ASSERT_H__ | ||
#define __DEVICELIB_ASSERT_H__ | ||
|
||
#include_next "assert.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not required if we rename this header to something like __spir_assert.h and include it any time compilation for spir target is done. This can be achieved if we move the header to the compiler headers as suggested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clang driver will have to add -include $CLANG_LIB/__spir_assert.h
options for all device headers we support - is that what you propose?
For the record, there are two minor disadvantages:
- we have to patch clang driver to add a new header
- all headers must be included - currently they contain ~500 LOC combined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clang driver will have to add
-include $CLANG_LIB/__spir_assert.h
options for all device headers we support - is that what you propose?
Something like this: https://github.com/intel/llvm/blob/sycl/clang/lib/Driver/ToolChains/Cuda.cpp#L247
For the record, there are two minor disadvantages:
- we have to patch clang driver to add a new header
- all headers must be included - currently they contain ~500 LOC combined
I see these as disadvantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, there are two minor disadvantages:
- we have to patch clang driver to add a new header
- all headers must be included - currently they contain ~500 LOC combined
I see these as disadvantages.
Sorry, it is not clear whether you're in favor of __spir_assert.h
or assert.h + #include_next
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've made a typo in this comment. I don't think these the items are disadvantages of the approach.
Agree. I left a comment above how to addresses this issue.
I think
I don't follow this comment. Could you clarify what definitions do you mean? This patch includes only wrapper definition and there is no need to use
This is not supposed to work.
|
Overall approach looks good to me. 👍 |
What about |
I'm not sure I understand why these functions should "allow redefintion". Shouldn't single definition be enough? Math functions definitions for offload targets in llorg use static and What if we add a compiler header similar to #ifdef __SYCL_DEVICE_ONLY__
static __attribute__((always_inline)) float cos(float __x) { return __devicelib_cosf(__x); }
...
#endif ? Regardless whether we decide to refactor math functions or not, I think we should merge this PR. |
OK, I think we can imply
You are right.
I am just saying that it is working right now, even though it is UB. As Andrew mentioned, |
Yes, if we merge this PR, users can't provide their own version functions to override the corresponding one in device library. If we can accept this, it is OK. |
I think adding the device library(wrapper library) in driver is another option. The work can be split into following steps:
|
I am concerned about the inability of users to provide their own definitions of functions like sin, cos, etc. For functions that begin with __ that's OK, because the user isn't allowed to use that prefix for their symbols. For math library functions, it must be allowed. It also worries me that we are violating the one definition rule here. I suppose the device library implementations do that already to some extent, but there at least it's happening at the object level which is outside of C++ standards scope. This introduces multiple definitions at the source level. I'd like to see input from some C/C++ experts on this. |
Can you clarify your concern? The comment tree above is massive and difficult to figure out what is happening. If we're providing a custom cmath (as the implementation, naming a file this is UB unless we are considered 'the implementation') we're responsible for ensuring that the entire standard library is provided. We're also responsible for making sure that every translation unit in a program sees the same definition of each function. Users cannot define functions defined in the standard library (nothing in namespace std, and nothing using identifiers reserved by C, including these names). When included in the C++ standard, it is unclear whether these names in the global namespace are still reserved however. |
Here my concern is that if the user wants to have a function named sin() that can be called instead of the standard library function, that should be allowed. It doesn't even need to have the same semantics as the standard library function. It could print "Hello world" if it wanted. The comments above led me to believe that the current implementation somehow disallows this. I don't know if the language standard allows this, but in practice it is done and is the reason the nobuiltin attribute and the -fno-builtin command line option exist.
Here my concern is a bit more pedantic, but could have practical consequences. If we've got a header file that does something like this: #ifdef SYCL_DEVICE and we compile the file with and without that symbol defined, we've broken the one definition rule in a very real way. We might get away with it, but I think we've clearly broken the rule. Does the use of inline make this OK? |
The language prohibits it (at least with the same signature in C++), but you're right, it IS something people do. That said, its so easy to accidentally include cmath/math.h I'm not sure someone could get away with it for long.
YES, that is technically an ODR violation. Inline doesn't save you, there are still multiple definitions (typically, inline makes the ODR WORSE, since it encourages inlining which leads to not as easily being able to replace the function). In reality though, the reason the ODR exists (in this case) is to protect the compiler in the case where the linker could choose one of a couple of definitions at random. For example, if you had in your program two versions of the same function in two different TUs, but ALSO had inlining and LTO, you could get a surprising behavior where the definition chosen in each call isn't necessarily the one from the same TU. The __assert_fail case above is fairly isolated from this problem since we're sort of changing the definition of 'program' as far as the ODR is concerned (at least this is how I sleep at night:) ). A "SYCL Program" is actually N "programs" as far as the language is concerned, each device sees its own 'program', so the ODR gets isolated in the Device vs host situation. |
Object file is no longer required to enable devicelib assert function. Support for other devicelib functions will follow. Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
Signed-off-by: Andrew Savonichev <[email protected]>
f41f8f0
to
464d3a5
Compare
void SYCLToolChain::AddDevicelibIncludeArgs(ArgStringList &CC1Args) { | ||
// Assume that clang system include path is added elsewhere | ||
CC1Args.push_back("-include"); | ||
CC1Args.push_back("devicelib/__devicelib_assert.h"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be included somewhere in SYCL headers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is to support compilation when SYCL headers are not included, e.g. when SYCL_EXTERNAL is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
The idea here is the compiler to provide additional functionality for SPIR target through the functions declared in this header.
Current implementation is not quite implements this idea - it includes the header for SYCL mode rather then SPIR target, so we probably have to skip this include for non-SPIR targets (e.g. nvidia).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is to support compilation when SYCL headers are not included
Compilation of what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is to support compilation when SYCL headers are not included
Compilation of what?
Compilation of source code that do not #include
SYCL headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation is not quite implements this idea - it includes the header for SYCL mode rather then SPIR target, so we probably have to skip this include for non-SPIR targets (e.g. nvidia).
The last patch includes the header when SYCL language is compiled for SPIR target. Non-SPIR targets should not be affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrew, can we make this include agnostic to compilation mode/language and include it for all compilations for SPIR target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it makes sense - sure. I'm concerned that this feature will be enabled for languages where it doesn't make sense (for OpenCL, in particular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it doesn't make sense for OpenCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ultimately for OpenCL folks to decide. Right now this extension is not documented to work for OpenCL.
@@ -0,0 +1,51 @@ | |||
//==------------ devicelib.h - macros for devicelib wrappers ---------------==// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path (i.e. devicelib/
) doesn't conflict with anything else, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what do you mean by a "conflict". When installed on a system, this directory resides in $prefix/lib/clang/$version/include
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then.
toolchains::SYCLToolChain::AddSYCLIncludeArgs(D, Args, CmdArgs); | ||
|
||
if (getToolChain().getTriple().isSPIR()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure, would:
if (getToolChain().getTriple().isSPIR()) { | |
if (JA.isDeviceOffloading()) { |
be unacceptable because DPC++ NVPTX compilations would also pass the condition?
if (getToolChain().getTriple().isSPIR()) { | ||
toolchains::SYCLToolChain::AddDevicelibIncludeArgs(CmdArgs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor suggestion would be to make this logic part of AddSYCLIncludeArgs()
(and get rid of the separate AddDevicelibIncludeArgs()
)
/// Check that devicelib include path is added | ||
// CHECK-SYCL-DEV: "-include" "devicelib{{[/\\]}}__devicelib_assert.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this have separate RUN
lines, with and without -fsycl-device-only
?
@@ -8,6 +8,9 @@ | |||
// RUN: | FileCheck -check-prefix=CHECK-SYCL-DEV %s | |||
// CHECK-SYCL-DEV: "-fsycl-is-device"{{.*}} "-internal-isystem" "{{.*}}bin{{[/\\]+}}..{{[/\\]+}}include{{[/\\]+}}sycl" | |||
|
|||
/// Check that devicelib include path is added | |||
// CHECK-SYCL-DEV: "-include" "devicelib{{[/\\]}}__devicelib_assert.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CHECK-SYCL-DEV: "-include" "devicelib{{[/\\]}}__devicelib_assert.h" | |
// CHECK-SYCL-DEV: "-fsycl-is-device"{{.*}} "-include" "devicelib{{[/\\]}}__devicelib_assert.h" |
There are two approaches: with #include_next (ff9a212), and with an implicit The first approach works fine, and I don't see any disadvantages of it. The second approach is a bit more tricky:
|
Hi, @asavonic |
Can we provide both |
Just checked MSVC's cmath header, it doesn't include math.h, so include_next may be the only choice for C++ std math headers. And it looks like is similar. |
@@ -232,6 +240,30 @@ glibc) has its own wrapper library object: | |||
- libsycl-glibc.o | |||
- libsycl-msvc.o | |||
|
|||
Header-based wrappers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @asavonic .
If compiler developer provides some functions in device library via headers, the compiler users should not try to provide their own version in source code. This "user override" behavior is permitted now as all functions in current device library is "weak". Suggest to tell users that they should not try to override any function provided in header-based device library.
Just a minor remark, the #include_next approach still requires driver modification, e.g. for OpenMP the include path to devicelib sub-directory in clang headers structure has to be added in the driver. I am OK if you want to proceed with ff9a212, but it has to be fixed so that assert.h is installed into clang headers structure - will this work for you? |
Agree with Slava, we need either to put the "assert.h" header into clang headers to make driver to add include path(s) to similar devicelib directory with standard headers wrappers before the standard library includes. Otherwise standard "assert.h" will be included. |
@asavonic , resolve conflicts |
Object file is no longer required to enable devicelib assert function.
Support for other devicelib functions will follow.
Signed-off-by: Andrew Savonichev [email protected]