Skip to content

[SYCL][CUDA] bfloat16 in oneapi namespace and also supporting CUDA #5393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 5, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= SYCL_INTEL_bf16_conversion
= sycl_ext_oneapi_bfloat16

:source-highlighter: coderay
:coderay-linenums-mode: table
Expand All @@ -24,15 +24,15 @@

IMPORTANT: This specification is a draft.

Copyright (c) 2021 Intel Corporation. All rights reserved.
Copyright (c) 2021-2022 Intel Corporation. All rights reserved.

NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are
trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc.
used by permission by Khronos.

== Dependencies

This extension is written against the SYCL 2020 specification, Revision 3.
This extension is written against the SYCL 2020 specification, Revision 4.

== Status

Expand All @@ -48,7 +48,7 @@ products.

== Version

Revision: 3
Revision: 4

== Introduction

Expand All @@ -73,7 +73,7 @@ command (e.g. from `parallel_for`).
This extension provides a feature-test macro as described in the core SYCL
specification section 6.3.3 "Feature test macros". Therefore, an implementation
supporting this extension must predefine the macro
`SYCL_EXT_INTEL_BF16_CONVERSION` to one of the values defined in the table
`SYCL_EXT_ONEAPI_BFLOAT16` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro’s
value to determine which of the extension’s APIs the implementation supports.
Expand All @@ -91,19 +91,19 @@ the implementation supports this feature, or applications can test the macro’s
namespace sycl {
enum class aspect {
...
ext_intel_bf16_conversion
ext_oneapi_bfloat16
}
}
----

If a SYCL device has the `ext_intel_bf16_conversion` aspect, then it natively
If a SYCL device has the `ext_oneapi_bfloat16` aspect, then it natively
supports conversion of values of `float` type to `bfloat16` and back.

If the device doesn't have the aspect, objects of `bfloat16` class must not be
used in the device code.

**NOTE**: The `ext_intel_bf16_conversion` aspect is not yet supported. The
`bfloat16` class is currently supported only on Xe HP GPU.
**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The
`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU.

== New `bfloat16` class

Expand All @@ -115,7 +115,7 @@ mode.
----
namespace sycl {
namespace ext {
namespace intel {
namespace oneapi {
namespace experimental {

class bfloat16 {
Expand Down Expand Up @@ -171,7 +171,7 @@ public:
};

} // namespace experimental
} // namespace intel
} // namespace oneapi
} // namespace ext
} // namespace sycl
----
Expand Down Expand Up @@ -277,9 +277,9 @@ OP is `==, !=, <, >, <=, >=`
[source]
----
#include <sycl/sycl.hpp>
#include <sycl/ext/intel/experimental/bfloat16.hpp>
#include <sycl/ext/oneapi/experimental/bfloat16.hpp>

using sycl::ext::intel::experimental::bfloat16;
using sycl::ext::oneapi::experimental::bfloat16;

bfloat16 operator+(const bfloat16 &lhs, const bfloat16 &rhs) {
return static_cast<float>(lhs) + static_cast<float>(rhs);
Expand All @@ -304,7 +304,7 @@ int main (int argc, char *argv[]) {
sycl::queue deviceQueue{dev};
sycl::buffer<float, 1> buf {data, sycl::range<1> {3}};

if (dev.has(sycl::aspect::ext_intel_bf16_conversion)) {
if (dev.has(sycl::aspect::ext_oneapi_bfloat16)) {
deviceQueue.submit ([&] (sycl::handler& cgh) {
auto numbers = buf.get_access<sycl::access::mode::read_write> (cgh);
cgh.single_task<class simple_kernel> ([=] () {
Expand Down Expand Up @@ -332,4 +332,5 @@ None.
Add operator overloadings +
Apply code review suggestions
|3|2021-08-18|Alexey Sotkin |Remove `uint16_t` constructor
|4|2022-03-07|Aidan Belton and Jack Kirk |Switch from Intel vendor specific to oneapi
|========================================
2 changes: 1 addition & 1 deletion sycl/include/CL/sycl/feature_test.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ namespace sycl {
#define SYCL_EXT_ONEAPI_SUB_GROUP 1
#define SYCL_EXT_ONEAPI_PROPERTIES 1
#define SYCL_EXT_ONEAPI_NATIVE_MATH 1
#define SYCL_EXT_INTEL_BF16_CONVERSION 1
#define SYCL_EXT_ONEAPI_BFLOAT16 1
#define SYCL_EXT_INTEL_DATAFLOW_PIPES 1
#ifdef __clang__
#if __has_extension(sycl_extended_atomics)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@
__SYCL_INLINE_NAMESPACE(cl) {
namespace sycl {
namespace ext {
namespace intel {
namespace oneapi {
namespace experimental {

class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
class bfloat16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, did you remove this C++ attribute intentionally?

Removing it is probably not a big problem because we don't have support in the rest of the tool chain yet to take advantage of this attribute. If there's no reason to remove it, though, it seems better to keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was intentional. I think that it will be replaced by the oneapi aspect in #5720.

The only reason for deleting it is that the cuda backend does not have this ext_intel_bf16_conversion aspect (which as I understand it will be replaced by the oneapi aspect generally). However I can try adding it back and seeing if the cuda backend bfloat16 test passes with it included, since I'm not sure how it will behave. Shall I do this? If the cuda backend test passes then this line can remain unchanged if that is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I thought the aspect was renamed in this PR, but I see now that it is not.

When #5720 is merged, we should add this C++ attribute back and use the new aspect. I think you could probably keep the attribute in this PR with the old aspect name, and it would probably not hurt CUDA. The attribute will cause the front-end to put some metadata on the IR, but I think all IR passes ignore that metadata right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I thought the aspect was renamed in this PR, but I see now that it is not.

When #5720 is merged, we should add this C++ attribute back and use the new aspect.

So is it OK to leave it as it is? I would remember to add it back when the new aspect is available....

I think you could probably keep the attribute in this PR with the old aspect name, and it would probably not hurt CUDA. The attribute will cause the front-end to put some metadata on the IR, but I think all IR passes ignore that metadata right now.

Or should I add it back now with the old aspect name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is OK.

using storage_t = uint16_t;
storage_t value;

Expand All @@ -29,15 +29,26 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
// Explicit conversion functions
static storage_t from_float(const float &a) {
#if defined(__SYCL_DEVICE_ONLY__)
#if defined(__NVPTX__)
return __nvvm_f2bf16_rn(a);
#else
return __spirv_ConvertFToBF16INTEL(a);
#endif
#else
throw exception{errc::feature_not_supported,
"Bfloat16 conversion is not supported on host device"};
#endif
}
static float to_float(const storage_t &a) {
#if defined(__SYCL_DEVICE_ONLY__)
#if defined(__NVPTX__)
uint32_t y = a;
y = y << 16;
float *res = reinterpret_cast<float *>(&y);
return *res;
#else
return __spirv_ConvertBF16ToFINTEL(a);
#endif
#else
throw exception{errc::feature_not_supported,
"Bfloat16 conversion is not supported on host device"};
Expand Down Expand Up @@ -70,7 +81,16 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {

// Unary minus operator overloading
friend bfloat16 operator-(bfloat16 &lhs) {
return bfloat16{-to_float(lhs.value)};
#if defined(__SYCL_DEVICE_ONLY__)
#if defined(__NVPTX__)
return from_bits(__nvvm_neg_bf16(lhs.value));
#else
return bfloat16{-__spirv_ConvertBF16ToFINTEL(lhs.value)};
#endif
#else
throw exception{errc::feature_not_supported,
"Bfloat16 unary minus is not supported on host device"};
#endif
}

// Increment and decrement operators overloading
Expand Down Expand Up @@ -143,7 +163,7 @@ class [[sycl_detail::uses_aspects(ext_intel_bf16_conversion)]] bfloat16 {
};

} // namespace experimental
} // namespace intel
} // namespace oneapi
} // namespace ext

} // namespace sycl
Expand Down
4 changes: 2 additions & 2 deletions sycl/test/extensions/bfloat16.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

// UNSUPPORTED: cuda || hip_amd

#include <sycl/ext/intel/experimental/bfloat16.hpp>
#include <sycl/ext/oneapi/experimental/bfloat16.hpp>
#include <sycl/sycl.hpp>

using sycl::ext::intel::experimental::bfloat16;
using sycl::ext::oneapi::experimental::bfloat16;

SYCL_EXTERNAL uint16_t some_bf16_intrinsic(uint16_t x, uint16_t y);
SYCL_EXTERNAL void foo(long x, sycl::half y);
Expand Down