|
| 1 | += SYCL_INTEL_group_algorithms |
| 2 | +:source-highlighter: coderay |
| 3 | +:coderay-linenums-mode: table |
| 4 | + |
| 5 | +// This section needs to be after the document title. |
| 6 | +:doctype: book |
| 7 | +:toc2: |
| 8 | +:toc: left |
| 9 | +:encoding: utf-8 |
| 10 | +:lang: en |
| 11 | + |
| 12 | +:blank: pass:[ +] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | +== Introduction |
| 20 | +IMPORTANT: This specification is a draft. |
| 21 | + |
| 22 | +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. |
| 23 | + |
| 24 | +NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons. |
| 25 | + |
| 26 | +This document describes an extension which introduces a library of group functions, including common parallel algorithms such as reductions and scans. |
| 27 | + |
| 28 | +== Name Strings |
| 29 | + |
| 30 | ++SYCL_INTEL_group_algorithms+ |
| 31 | + |
| 32 | +== Notice |
| 33 | + |
| 34 | +Copyright (c) 2020 Intel Corporation. All rights reserved. |
| 35 | + |
| 36 | +== Status |
| 37 | + |
| 38 | +Working Draft |
| 39 | + |
| 40 | +This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension. |
| 41 | + |
| 42 | +Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. |
| 43 | + |
| 44 | +== Version |
| 45 | + |
| 46 | +Built On: {docdate} + |
| 47 | +Revision: 1 |
| 48 | + |
| 49 | +== Contact |
| 50 | +John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com) |
| 51 | + |
| 52 | +== Dependencies |
| 53 | + |
| 54 | +This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6. |
| 55 | + |
| 56 | +== Overview |
| 57 | + |
| 58 | +This extension introduces a library of group algorithms, providing a simple way for developers to apply common parallel algorithms across data held by different work-items in the same group, or to use the work-items of a group to apply common parallel algorithms across data stored in memory. |
| 59 | + |
| 60 | +The extension introduces the following functions: |
| 61 | + |
| 62 | +- +any_of+ |
| 63 | +- +all_of+ |
| 64 | +- +none_of+ |
| 65 | +- +broadcast+ |
| 66 | +- +reduce+ |
| 67 | +- +exclusive_scan+ |
| 68 | +- +inclusive_scan+ |
| 69 | + |
| 70 | +=== Alignment with OpenCL vs C++ |
| 71 | + |
| 72 | +Where a feature is common to both OpenCL and {cpp}, this proposal opts for {cpp}-like naming: |
| 73 | + |
| 74 | +- Collective operators are named as in +<functional>+ (e.g. +plus+ instead of +sum+) and to avoid clashes with names in +<algorithm>+ (e.g. +minimum+ instead of +min+). |
| 75 | +- Algorithms are named as in +<algorithm>+ (e.g. +inclusive_scan+ instead of +scan_inclusive+). |
| 76 | + |
| 77 | +=== Towards a Generic Group Interface |
| 78 | + |
| 79 | +This extension adds a number of static members to the +group+ class to simplify the interfaces of the functions in this library: |
| 80 | + |
| 81 | +- +id_type+: The type used to represent work-item IDs within the group |
| 82 | +- +range_type+: The type used to represent the range of the group |
| 83 | +- +linear_id_type+: The type used to represent linear work-item IDs within the group |
| 84 | +- +dimensions+: An integral value representing the dimensionality of the group |
| 85 | + |
| 86 | +[source, c++] |
| 87 | +---- |
| 88 | +template <int Dimensions = 1> |
| 89 | +class group |
| 90 | +{ |
| 91 | +public: |
| 92 | + using id_type = id<Dimensions>; |
| 93 | + using range_type = range<Dimensions>; |
| 94 | + using linear_id_type = size_t; |
| 95 | + static constexpr int dimensions = Dimensions; |
| 96 | +}; |
| 97 | +---- |
| 98 | + |
| 99 | +=== Data Types |
| 100 | + |
| 101 | +All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+. |
| 102 | + |
| 103 | +Functions with arguments of type +vec<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+. |
| 104 | + |
| 105 | +=== Function Objects |
| 106 | + |
| 107 | +A number of function objects are provided in the +cl::sycl::intel+ namespace. These function objects are used for all interfaces requiring an operator to be specified. All function objects obey C++ conversion and promotion rules. |
| 108 | + |
| 109 | +Transparent function objects are provided if using a {cpp}14 compiler -- the parameter types and return type for transparent function objects will be deduced if +T+ is not specified. |
| 110 | + |
| 111 | +The following function objects alias objects in the +<functional>+ header from the {cpp} standard library: |
| 112 | + |
| 113 | +- +cl::sycl::intel::plus+ |
| 114 | +- +cl::sycl::intel::multiplies+ |
| 115 | +- +cl::sycl::intel::bit_and+ |
| 116 | +- +cl::sycl::intel::bit_or+ |
| 117 | +- +cl::sycl::intel::bit_xor+ |
| 118 | +- +cl::sycl::intel::logical_and+ |
| 119 | +- +cl::sycl::intel::logical_or+ |
| 120 | + |
| 121 | +New function objects without {cpp} standard library equivalents are defined in the table below: |
| 122 | + |
| 123 | +|=== |
| 124 | +|Function Object|Description |
| 125 | + |
| 126 | +|+template <typename T=void> struct minimum;+ |
| 127 | +|+T operator(const T&, const T&) const+ applies +std::less+ to its arguments, in the same order, then returns the lesser argument unchanged. |
| 128 | + |
| 129 | +|+template <typename T=void> struct maximum;+ |
| 130 | +|+T operator(const T&, const T&) const+ applies +std::greater+ to its arguments, in the same order, then returns the greater argument unchanged. |
| 131 | +|=== |
| 132 | + |
| 133 | +Function objects supported by the group algorithms library can be identified using the +cl::sycl::intel::is_native_function_object+ and +cl::sycl::intel::is_native_function_object_v+ traits classes. |
| 134 | + |
| 135 | +=== Functions |
| 136 | + |
| 137 | +The group algorithms library is based on the algorithms library described in Section 28 of the {cpp}17 standard. The syntax and restrictions are aligned, with two notable differences: the first argument to each function is a group of work-items, in place of an execution policy; and pointers are accepted in place of iterators in order to guarantee that address space information is visible to the compiler. |
| 138 | + |
| 139 | +Group algorithms are performed collaboratively by the work-items in a group. All functions therefore act as synchronization points and must be encountered in converged control flow by all work-items in the group -- if one work-item in the group reaches the function, then all work-items in the group must reach the function. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions. |
| 140 | + |
| 141 | +Many functions provide at least two overloads: one operating directly on data produced by the work-items in the group, and another operating on a range of data in memory specified by a pair of pointers. If the pointers passed to such a group function are not the same for all work-items in the group, their behavior is undefined. How the elements of a range are processed by the work-items in a group is undefined. |
| 142 | + |
| 143 | +Using functions from the group algorithms library inside of a kernel may introduce additional limits on the resources available to user code inside the same kernel (e.g. private memory, work-group local memory). The behavior of these limits is implementation-defined, but must be reflected by calls to kernel querying functions such as +kernel::get_work_group_info+. |
| 144 | + |
| 145 | +It is undefined behavior for any of these functions to be invoked within a +parallel_for_work_group+ or +parallel_for_work_item+ context, but this restriction may be lifted in a future version of the proposal. |
| 146 | + |
| 147 | +==== Vote |
| 148 | + |
| 149 | +|=== |
| 150 | +|Function|Description |
| 151 | + |
| 152 | +|+template <typename Group> bool any_of(Group g, bool pred);+ |
| 153 | +|Return +true+ if _pred_ is +true+ for any work-item in the group. |
| 154 | + |
| 155 | +|+template <typename Group> bool all_of(Group g, bool pred);+ |
| 156 | +|Return +true+ if _pred_ is +true+ for all work-items in the group. |
| 157 | + |
| 158 | +|+template <typename Group> bool none_of(Group g, bool pred);+ |
| 159 | +|Return +true+ if _pred_ is +true+ for no work-items in the group. |
| 160 | + |
| 161 | +|+template <typename Group, typename T, class Predicate> bool any_of(Group g, T x, Predicate pred);+ |
| 162 | +|Return +true+ if _pred(x)_ is +true+ for any work-item in the group. _pred_ must be the same for all work-items in the group. |
| 163 | + |
| 164 | +|+template <typename Group, typename T, class Predicate> bool all_of(Group g, T x, Predicate pred);+ |
| 165 | +|Return +true+ if _pred(x)_ is +true+ for all work-items in the group. _pred_ must be the same for all work-items in the group. |
| 166 | + |
| 167 | +|+template <typename Group, typename T, class Predicate> bool none_of(Group g, T x, Predicate pred);+ |
| 168 | +|Return +true+ if _pred(x)_ is +true+ for no work-items in the group. _pred_ must be the same for all work-items in the group. |
| 169 | +|=== |
| 170 | + |
| 171 | +|=== |
| 172 | +|Function|Description |
| 173 | + |
| 174 | +|+template <typename Group, typename Ptr, class Predicate> bool any_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 175 | +|Return +true+ if _pred_ returns +true+ for any element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 176 | + |
| 177 | +|+template <typename Group, typename Ptr, class Predicate> bool all_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 178 | +|Return +true+ if _pred_ returns +true+ for all elements in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 179 | + |
| 180 | +|+template <typename Group, typename Ptr, class Predicate> bool none_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 181 | +|Return +true+ if _pred_ returns +true+ for no element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 182 | +|=== |
| 183 | + |
| 184 | +==== Collectives |
| 185 | + |
| 186 | +In this section, the meaning of "exclusive scan" and "inclusive scan" are as defined in Sections 29.8.7 and 29.8.8 of the {cpp}17 specification, respectively. |
| 187 | + |
| 188 | +The return types of the collective functions in {cpp}17 are not deduced from the return type of the specified binary operator, but from either the type of the input values or the type of the initialization value (if one is provided). This is error-prone and can lead to unexpected behavior (e.g. specifying an initial value of `0` instead of `0.0f` for a floating-point reduction will cause the results to be accumulated in an integer). To minimize the chances of encountering such errors, the collective functions in the group algorithms library place additional restrictions on type combinations that can be deduced. |
| 189 | + |
| 190 | +|=== |
| 191 | +|Function|Description |
| 192 | + |
| 193 | +|+template <typename Group, typename T> T broadcast(Group g, T x);+ |
| 194 | +|Broadcast the value of _x_ from the work-item with the lowest id to all work-items within the group. |
| 195 | + |
| 196 | +|+template <typename Group, typename T> T broadcast(Group g, T x, Group::linear_id_type local_linear_id);+ |
| 197 | +|Broadcast the value of _x_ from the work-item with the specified linear id to all work-items within the group. The value of _local_linear_id_ must be the same for all work-items in the group. |
| 198 | + |
| 199 | +|+template <typename Group, typename T> T broadcast(Group g, T x, Group::id_type local_id);+ |
| 200 | +|Broadcast the value of _x_ from the work-item with the specified id to all work-items within the group. The value of _local_id_ must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group. |
| 201 | + |
| 202 | +|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+ |
| 203 | +|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. |
| 204 | + |
| 205 | +|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+ |
| 206 | +|Combine the values of _x_ from all work-items in the group using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. |
| 207 | + |
| 208 | +|+template <typename Group, typename T, class BinaryOperation> T exclusive_scan(Group g, T x, BinaryOperation binary_op);+ |
| 209 | +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work-items in the group and the identity value of _binary_op_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. |
| 210 | + |
| 211 | +|+template <typename Group, typename V, typename T, class BinaryOperation> T exclusive_scan(Group g, V x, T init, BinaryOperation binary_op);+ |
| 212 | +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _init_ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. |
| 213 | + |
| 214 | +|+template <typename Group, typename T, class BinaryOperation> T inclusive_scan(Group g, T x, BinaryOperation binary_op);+ |
| 215 | +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. |
| 216 | + |
| 217 | +|+template <typename Group, typename V, class BinaryOperation, typename T> T inclusive_scan(Group g, V x, BinaryOperation binary_op, T init);+ |
| 218 | +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ and _init_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. |
| 219 | +|=== |
| 220 | + |
| 221 | +|=== |
| 222 | +|Function|Description |
| 223 | + |
| 224 | +|+template <typename Group, typename Ptr, class BinaryOperation> Ptr::element_type reduce(Group g, Ptr first, Ptr last, BinaryOperation binary_op);+ |
| 225 | +|Combine the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _Ptr::element_type_. |
| 226 | + |
| 227 | +|+template <typename Group, typename Ptr, typename T, class BinaryOperation> T reduce(Group g, Ptr first, Ptr last, T init, BinaryOperation binary_op);+ |
| 228 | +|Combine the values in the range [_first_, _last_) using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_, _init__ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. |
| 229 | + |
| 230 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ |
| 231 | +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and the identity value of _binary_op_. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _OutPtr::element_type_. |
| 232 | + |
| 233 | +|+template <typename Group, typename InPtr, typename OutPtr, typename T, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, T init, BinaryOperation binary_op);+ |
| 234 | +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _init_ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. |
| 235 | + |
| 236 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ |
| 237 | +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _OutPtr::element_type_. |
| 238 | + |
| 239 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation, typename T> OutrPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op, T init);+ |
| 240 | +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. |
| 241 | +|=== |
| 242 | + |
| 243 | +== Issues |
| 244 | + |
| 245 | +None. |
| 246 | + |
| 247 | +//. asd |
| 248 | +//+ |
| 249 | +//-- |
| 250 | +//*RESOLUTION*: Not resolved. |
| 251 | +//-- |
| 252 | +
|
| 253 | +== Revision History |
| 254 | +
|
| 255 | +[cols="5,15,15,70"] |
| 256 | +[grid="rows"] |
| 257 | +[options="header"] |
| 258 | +|======================================== |
| 259 | +|Rev|Date|Author|Changes |
| 260 | +|1|2020-01-30|John Pennycook|*Initial public working draft* |
| 261 | +|======================================== |
| 262 | +
|
| 263 | +//************************************************************************ |
| 264 | +//Other formatting suggestions: |
| 265 | +// |
| 266 | +//* Use *bold* text for host APIs, or [source] syntax highlighting. |
| 267 | +//* Use +mono+ text for device APIs, or [source] syntax highlighting. |
| 268 | +//* Use +mono+ text for extension names, types, or enum values. |
| 269 | +//* Use _italics_ for parameters. |
| 270 | +//************************************************************************ |
0 commit comments