You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SYCL][Doc] math functions added to bfloat16 ext (#5645)
This SYCL 2020 extension proposal proposes adding bfloat16 support to the fma, fmin, fmax and fabs SYCL floating point math functions.
Blocked by #5393
Co-authored-by: JackAKirk <[email protected]>
Copy file name to clipboardExpand all lines: sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16.asciidoc
+78-3
Original file line number
Diff line number
Diff line change
@@ -48,7 +48,7 @@ products.
48
48
49
49
== Version
50
50
51
-
Revision: 4
51
+
Revision: 5
52
52
53
53
== Introduction
54
54
@@ -103,7 +103,7 @@ If the device doesn't have the aspect, objects of `bfloat16` class must not be
103
103
used in the device code.
104
104
105
105
**NOTE**: The `ext_oneapi_bfloat16` aspect is not yet supported. The
106
-
`bfloat16` class is currently supported only on Xe HP GPU and Nvidia A100 GPU.
106
+
`bfloat16` class is currently supported only on Xe HP GPU and Nvidia GPUs with Compute Capability >= SM80.
107
107
108
108
== New `bfloat16` class
109
109
@@ -316,9 +316,83 @@ int main (int argc, char *argv[]) {
316
316
}
317
317
----
318
318
319
+
== New bfloat16 math functions
320
+
321
+
Many applications will require dedicated functions that take parameters of type `bfloat16`. This extension adds `bfloat16` support to the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions. These functions can be used as element wise operations on matrices, supplementing the `bfloat16` support in the sycl_ext_oneapi_matrix extension.
322
+
323
+
The descriptions of the `fma`, `fmin`, `fmax` and `fabs` SYCL floating point math functions can be found in the SYCL specification: https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_math_functions.
324
+
325
+
The following functions are only available when `T` is `bfloat16` or `sycl::marray<bfloat16, {N}>`, where `{N}` means any positive value of `size_t` type.
326
+
327
+
=== fma
328
+
329
+
```c++
330
+
namespace sycl::ext::oneapi::experimental {
331
+
332
+
template <typename T>
333
+
T fma(T a, T b, T c);
334
+
} // namespace sycl::ext::oneapi::experimental
335
+
```
336
+
337
+
==== Description
338
+
339
+
Returns the correctly rounded floating-point representation of the sum of `c` with the infinitely precise product of `a` and `b`.
340
+
Rounding of intermediate products shall not occur. The mantissa LSB rounds to the nearest even. Subnormal numbers are supported.
341
+
342
+
=== fmax
343
+
344
+
```c++
345
+
namespace sycl::ext::oneapi::experimental {
346
+
template <typename T>
347
+
T fmax(T x, T y);
348
+
} // namespace sycl::ext::oneapi::experimental
349
+
```
350
+
351
+
==== Description
352
+
353
+
Returns `y` if
354
+
`x < y`, otherwise it
355
+
returns `x`. If one argument is a
356
+
NaN, `fmax()` returns the other
357
+
argument. If both arguments are
358
+
NaNs, `fmax()` returns a NaN.
359
+
360
+
=== fmin
361
+
362
+
```c++
363
+
namespace sycl::ext::oneapi::experimental {
364
+
template <typename T>
365
+
T fmin(T x, T y);
366
+
} // namespace sycl::ext::oneapi::experimental
367
+
```
368
+
369
+
==== Description
370
+
371
+
Returns `y` if
372
+
`y < x`, otherwise it
373
+
returns `x`. If one argument is a
374
+
NaN, `fmax()` returns the other
375
+
argument. If both arguments are
376
+
NaNs, `fmax()` returns a NaN.
377
+
378
+
=== fabs
379
+
380
+
```c++
381
+
namespace sycl::ext::oneapi::experimental {
382
+
template <typename T>
383
+
T fabs(T x);
384
+
} // namespace sycl::ext::oneapi::experimental
385
+
```
386
+
387
+
==== Description
388
+
389
+
Compute absolute value of a `bfloat16`.
390
+
319
391
== Issues
320
392
321
-
None.
393
+
1. The CUDA backend does not have a use case that would necessitate support of the `vec` class in bfloat16 math functions, and `marray` would always be preferred over `vec` if `vec` support were to be added in the CUDA backend. For portability reasons, support for the `vec` class can be easily added if other backends require it.
394
+
395
+
2. We should decide on a roadmap to extend support of `bfloat16` to other SYCL 2020 math functions.
0 commit comments