[SYCL][Doc] Add GroupAlgorithms extension #1079

Pennycook · 2020-01-30T18:46:37Z

Replaces GroupCollectives extension with a library of free functions:

any_of
all_of
none_of
broadcast
reduce
exclusive_scan
inclusive_scan

Signed-off-by: John Pennycook [email protected]

Replaces GroupCollectives extension with a library of free functions: - any_of - all_of - none_of - broadcast - reduce - exclusive_scan - inclusive_scan Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2020-01-30T21:18:32Z

Found a typo where "any" and "all" were swapped. Fixed.

sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc

rolandschulz · 2020-01-31T03:23:31Z

sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc

+|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+
+|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects.  _binary_op_ must be the same for all work-items in the group.
+
+|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+


Do we want to deduce V and T separately and use T as return type? Also it doesn't say what the accumulator type is. I suggest we keep it simple and don't allow mixed type to be automatic. Meaning only deduce once (like e.g. std::max) and also require that is_same_v<decltype(binary_op(x,x)), T>. If we keep it simple like that then we should make it easy to specify the type and should make T the first template argument.

I've attempted to address this in f09a7c9. Is the new text clear?

Signed-off-by: John Pennycook <[email protected]>

Pennycook · 2020-01-31T22:16:32Z

Ah, I think I misunderstood you. I thought we still wanted experts to be able to express things that are expressible in standard C++ (where the result of binary_op needs to be convertible to T, but need not actually be T) -- but only if they explicitly provided the template arguments.

I'll remove the note about expert usage and deduction.

Also fixed a typo: Ptr => OutPtr Signed-off-by: John Pennycook <[email protected]>

…ctor_tests * origin/sycl: (32 commits) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h [SYCL][CUDA] Handle the case of not having any CUDA device (intel#1212) [SYCL] Fix check-sycl-deploy target problems (intel#1165) [SYCL] Disable tests which take more than 5 minutes (intel#1220) [SYCL] Make context constructors explicit to avoid unintended conversions (intel#1219) [SYCL][NFC] Add clang-format configuration file for SYCL LIT tests (intel#1224) [SYCL] Fix command cleanup invoked from multiple threads (intel#1214) ...

…_accessor_refactor * origin/sycl: (38 commits) [SYCL] Fix device::get_devices() with a non-host device type (intel#1235) [SYCL][PI][CUDA] Implement kernel and kernel-group information queries (intel#1180) [SYCL] Remove default error code value in exception (intel#1150) [SYCL] Fix devicelib assert LIT test (intel#1245) [SYCL] Set aux-target-cpu for SYCL offload device compilation (intel#1225) [SYCL] Remove fabs and ceil from the list of unsupported math functions (intel#1217) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h ...

To complement the bf16 expansion and truncation patterns added to ExpandOps, define a pass that replaces, for any arithmetic operation op, %y = arith.op %v0, %v1, ... : T with %e0 = arith.expf %v0 : T to U %e1 = arith.expf %v1 : T to U ... %y.exp = arith.op %e0, %e1, ... : U %y = arith.truncf %y.exp : U to T This allows for "emulating" floating-point operations not supported on a given target (such as bfloat operations or most arithmetic on 8-bit floats) by extending those types to supported ones, performing the arithmetic operation, and then truncating back to the original type (which ensures appropriate rounding behavior). The lowering of the extf and truncf ops introduced by this transformation should be handled by subsequent passes. Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D154539

Pennycook added the spec extension All issues/PRs related to extensions specifications label Jan 30, 2020

Pennycook requested a review from bader January 30, 2020 18:46

bader previously approved these changes Jan 30, 2020

View reviewed changes

[SYCL][Doc] Add GroupAlgorithms extension

4ee810e

Replaces GroupCollectives extension with a library of free functions: - any_of - all_of - none_of - broadcast - reduce - exclusive_scan - inclusive_scan Signed-off-by: John Pennycook <[email protected]>

Pennycook dismissed bader’s stale review via 4ee810e January 30, 2020 21:18

Pennycook force-pushed the group-algorithms branch from 4effd10 to 4ee810e Compare January 30, 2020 21:18

rolandschulz reviewed Jan 31, 2020

View reviewed changes

[SYCL][Doc] Add deduction restrictions

f09a7c9

Signed-off-by: John Pennycook <[email protected]>

[SYCL][Doc] Remove "if T is deduced"

6148f66

Also fixed a typo: Ptr => OutPtr Signed-off-by: John Pennycook <[email protected]>

Pennycook requested a review from rolandschulz February 10, 2020 17:57

Pennycook mentioned this pull request Feb 21, 2020

[SYCL] Fix __spirv_GroupBroadcast overloads #1152

Merged

rolandschulz approved these changes Mar 2, 2020

View reviewed changes

Pennycook assigned bader Mar 2, 2020

bader merged commit c181fdb into intel:sycl Mar 3, 2020

Pennycook deleted the group-algorithms branch March 3, 2020 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Doc] Add GroupAlgorithms extension #1079

[SYCL][Doc] Add GroupAlgorithms extension #1079

Pennycook commented Jan 30, 2020

Pennycook commented Jan 30, 2020

rolandschulz Jan 31, 2020

Pennycook Jan 31, 2020 •

edited

Loading

Pennycook commented Jan 31, 2020

[SYCL][Doc] Add GroupAlgorithms extension #1079

[SYCL][Doc] Add GroupAlgorithms extension #1079

Conversation

Pennycook commented Jan 30, 2020

Pennycook commented Jan 30, 2020

rolandschulz Jan 31, 2020

Choose a reason for hiding this comment

Pennycook Jan 31, 2020 • edited Loading

Choose a reason for hiding this comment

Pennycook commented Jan 31, 2020

Pennycook Jan 31, 2020 •

edited

Loading