-
Notifications
You must be signed in to change notification settings - Fork 768
[SYCL][Doc] Add GroupAlgorithms extension #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaces GroupCollectives extension with a library of free functions: - any_of - all_of - none_of - broadcast - reduce - exclusive_scan - inclusive_scan Signed-off-by: John Pennycook <[email protected]>
4effd10
to
4ee810e
Compare
Found a typo where "any" and "all" were swapped. Fixed. |
sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc
Show resolved
Hide resolved
|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+ | ||
|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. | ||
|
||
|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to deduce V
and T
separately and use T
as return type? Also it doesn't say what the accumulator type is. I suggest we keep it simple and don't allow mixed type to be automatic. Meaning only deduce once (like e.g. std::max
) and also require that is_same_v<decltype(binary_op(x,x)), T>
. If we keep it simple like that then we should make it easy to specify the type and should make T
the first template argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've attempted to address this in f09a7c9. Is the new text clear?
Signed-off-by: John Pennycook <[email protected]>
Ah, I think I misunderstood you. I thought we still wanted experts to be able to express things that are expressible in standard C++ (where the result of I'll remove the note about expert usage and deduction. |
Also fixed a typo: Ptr => OutPtr Signed-off-by: John Pennycook <[email protected]>
…ctor_tests * origin/sycl: (32 commits) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h [SYCL][CUDA] Handle the case of not having any CUDA device (intel#1212) [SYCL] Fix check-sycl-deploy target problems (intel#1165) [SYCL] Disable tests which take more than 5 minutes (intel#1220) [SYCL] Make context constructors explicit to avoid unintended conversions (intel#1219) [SYCL][NFC] Add clang-format configuration file for SYCL LIT tests (intel#1224) [SYCL] Fix command cleanup invoked from multiple threads (intel#1214) ...
…_accessor_refactor * origin/sycl: (38 commits) [SYCL] Fix device::get_devices() with a non-host device type (intel#1235) [SYCL][PI][CUDA] Implement kernel and kernel-group information queries (intel#1180) [SYCL] Remove default error code value in exception (intel#1150) [SYCL] Fix devicelib assert LIT test (intel#1245) [SYCL] Set aux-target-cpu for SYCL offload device compilation (intel#1225) [SYCL] Remove fabs and ceil from the list of unsupported math functions (intel#1217) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h ...
To complement the bf16 expansion and truncation patterns added to ExpandOps, define a pass that replaces, for any arithmetic operation op, %y = arith.op %v0, %v1, ... : T with %e0 = arith.expf %v0 : T to U %e1 = arith.expf %v1 : T to U ... %y.exp = arith.op %e0, %e1, ... : U %y = arith.truncf %y.exp : U to T This allows for "emulating" floating-point operations not supported on a given target (such as bfloat operations or most arithmetic on 8-bit floats) by extending those types to supported ones, performing the arithmetic operation, and then truncating back to the original type (which ensures appropriate rounding behavior). The lowering of the extf and truncf ops introduced by this transformation should be handled by subsequent passes. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D154539
Replaces GroupCollectives extension with a library of free functions:
Signed-off-by: John Pennycook [email protected]