Skip to content

[SYCL][Doc] Add GroupAlgorithms extension #1079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 3, 2020

Conversation

Pennycook
Copy link
Contributor

Replaces GroupCollectives extension with a library of free functions:

  • any_of
  • all_of
  • none_of
  • broadcast
  • reduce
  • exclusive_scan
  • inclusive_scan

Signed-off-by: John Pennycook [email protected]

@Pennycook Pennycook added the spec extension All issues/PRs related to extensions specifications label Jan 30, 2020
@Pennycook Pennycook requested a review from bader January 30, 2020 18:46
bader
bader previously approved these changes Jan 30, 2020
Replaces GroupCollectives extension with a library of free functions:
- any_of
- all_of
- none_of
- broadcast
- reduce
- exclusive_scan
- inclusive_scan

Signed-off-by: John Pennycook <[email protected]>
@Pennycook
Copy link
Contributor Author

Found a typo where "any" and "all" were swapped. Fixed.

|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+
|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group.

|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to deduce V and T separately and use T as return type? Also it doesn't say what the accumulator type is. I suggest we keep it simple and don't allow mixed type to be automatic. Meaning only deduce once (like e.g. std::max) and also require that is_same_v<decltype(binary_op(x,x)), T>. If we keep it simple like that then we should make it easy to specify the type and should make T the first template argument.

Copy link
Contributor Author

@Pennycook Pennycook Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to address this in f09a7c9. Is the new text clear?

@Pennycook
Copy link
Contributor Author

Ah, I think I misunderstood you. I thought we still wanted experts to be able to express things that are expressible in standard C++ (where the result of binary_op needs to be convertible to T, but need not actually be T) -- but only if they explicitly provided the template arguments.

I'll remove the note about expert usage and deduction.

Also fixed a typo: Ptr => OutPtr

Signed-off-by: John Pennycook <[email protected]>
@bader bader merged commit c181fdb into intel:sycl Mar 3, 2020
@Pennycook Pennycook deleted the group-algorithms branch March 3, 2020 15:39
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Mar 4, 2020
…ctor_tests

* origin/sycl: (32 commits)
  [SYCL] Fix circular reference between events and queues (intel#1226)
  [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232)
  [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174)
  [SYCL] allow underscore symbol in temporary directory name
  [SYCL] Reject zero length arrays (intel#1153)
  [SYCL] Fix static code analyzis concerns (intel#1189)
  [SYCL] Add more details about the -fintelfpga option (intel#1218)
  [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223)
  [SYCL] Reverse max work-group size order (intel#1177)
  [SYCL][Doc] Add GroupAlgorithms extension (intel#1079)
  [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188)
  [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207)
  [SYCL][CUDA] Fix context creation property parsing
  [CUDA][PI] clang-format pi.h
  [SYCL][CUDA] Handle the case of not having any CUDA device (intel#1212)
  [SYCL] Fix check-sycl-deploy target problems (intel#1165)
  [SYCL] Disable tests which take more than 5 minutes (intel#1220)
  [SYCL] Make context constructors explicit to avoid unintended conversions (intel#1219)
  [SYCL][NFC] Add clang-format configuration file for SYCL LIT tests (intel#1224)
  [SYCL] Fix command cleanup invoked from multiple threads (intel#1214)
  ...
alexbatashev pushed a commit to alexbatashev/llvm that referenced this pull request Mar 5, 2020
…_accessor_refactor

* origin/sycl: (38 commits)
  [SYCL] Fix device::get_devices() with a non-host device type (intel#1235)
  [SYCL][PI][CUDA] Implement kernel and kernel-group information queries (intel#1180)
  [SYCL] Remove default error code value in exception (intel#1150)
  [SYCL] Fix devicelib assert LIT test (intel#1245)
  [SYCL] Set aux-target-cpu for SYCL offload device compilation (intel#1225)
  [SYCL] Remove fabs and ceil from the list of unsupported math functions (intel#1217)
  [SYCL] Fix circular reference between events and queues (intel#1226)
  [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232)
  [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174)
  [SYCL] allow underscore symbol in temporary directory name
  [SYCL] Reject zero length arrays (intel#1153)
  [SYCL] Fix static code analyzis concerns (intel#1189)
  [SYCL] Add more details about the -fintelfpga option (intel#1218)
  [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223)
  [SYCL] Reverse max work-group size order (intel#1177)
  [SYCL][Doc] Add GroupAlgorithms extension (intel#1079)
  [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188)
  [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207)
  [SYCL][CUDA] Fix context creation property parsing
  [CUDA][PI] clang-format pi.h
  ...
iclsrc pushed a commit that referenced this pull request Jul 11, 2023
To complement the bf16 expansion and truncation patterns added to
ExpandOps, define a pass that replaces, for any arithmetic operation
op,
%y = arith.op %v0, %v1, ... : T
with
%e0 = arith.expf %v0 : T to U
%e1 = arith.expf %v1 : T to U
...
%y.exp = arith.op %e0, %e1, ... : U
%y = arith.truncf %y.exp : U to T

This allows for "emulating" floating-point operations not supported on
a given target (such as bfloat operations or most arithmetic on 8-bit
floats) by extending those types to supported ones, performing the
arithmetic operation, and then truncating back to the original
type (which ensures appropriate rounding behavior).

The lowering of the extf and truncf ops introduced by this
transformation should be handled by subsequent passes.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D154539
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants