|
| 1 | += SYCL_INTEL_group_algorithms |
| 2 | +:source-highlighter: coderay |
| 3 | +:coderay-linenums-mode: table |
| 4 | + |
| 5 | +// This section needs to be after the document title. |
| 6 | +:doctype: book |
| 7 | +:toc2: |
| 8 | +:toc: left |
| 9 | +:encoding: utf-8 |
| 10 | +:lang: en |
| 11 | + |
| 12 | +:blank: pass:[ +] |
| 13 | + |
| 14 | +// Set the default source code type in this document to C++, |
| 15 | +// for syntax highlighting purposes. This is needed because |
| 16 | +// docbook uses c++ and html5 uses cpp. |
| 17 | +:language: {basebackend@docbook:c++:cpp} |
| 18 | + |
| 19 | +== Introduction |
| 20 | +IMPORTANT: This specification is a draft. |
| 21 | + |
| 22 | +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. |
| 23 | + |
| 24 | +NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons. |
| 25 | + |
| 26 | +This document describes an extension which introduces a library of group functions, including common parallel algorithms such as reductions and scans. |
| 27 | + |
| 28 | +== Name Strings |
| 29 | + |
| 30 | ++SYCL_INTEL_group_algorithms+ |
| 31 | + |
| 32 | +== Notice |
| 33 | + |
| 34 | +Copyright (c) 2020 Intel Corporation. All rights reserved. |
| 35 | + |
| 36 | +== Status |
| 37 | + |
| 38 | +Working Draft |
| 39 | + |
| 40 | +This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension. |
| 41 | + |
| 42 | +Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. |
| 43 | + |
| 44 | +== Version |
| 45 | + |
| 46 | +Built On: {docdate} + |
| 47 | +Revision: 1 |
| 48 | + |
| 49 | +== Contact |
| 50 | +John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com) |
| 51 | + |
| 52 | +== Dependencies |
| 53 | + |
| 54 | +This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6. |
| 55 | + |
| 56 | +== Overview |
| 57 | + |
| 58 | +This extension introduces a library of group algorithms, providing a simple way for developers to apply common parallel algorithms across data held by different work-items in the same group, or to use the work-items of a group to apply common parallel algorithms across data stored in memory. |
| 59 | + |
| 60 | +The extension introduces the following functions: |
| 61 | + |
| 62 | +- +any_of+ |
| 63 | +- +all_of+ |
| 64 | +- +none_of+ |
| 65 | +- +broadcast+ |
| 66 | +- +reduce+ |
| 67 | +- +exclusive_scan+ |
| 68 | +- +inclusive_scan+ |
| 69 | + |
| 70 | +=== Alignment with OpenCL vs C++ |
| 71 | + |
| 72 | +Where a feature is common to both OpenCL and {cpp}, this proposal opts for {cpp}-like naming: |
| 73 | + |
| 74 | +- Collective operators are named as in +<functional>+ (e.g. +plus+ instead of +sum+) and to avoid clashes with names in +<algorithm>+ (e.g. +minimum+ instead of +min+). |
| 75 | +- Algorithms are named as in +<algorithm>+ (e.g. +inclusive_scan+ instead of +scan_inclusive+). |
| 76 | + |
| 77 | +=== Towards a Generic Group Interface |
| 78 | + |
| 79 | +This extension adds a number of static members to the +group+ class to simplify the interfaces of the functions in this library: |
| 80 | + |
| 81 | +- +id_type+: The type used to represent work-item IDs within the group |
| 82 | +- +range_type+: The type used to represent the range of the group |
| 83 | +- +linear_id_type+: The type used to represent linear work-item IDs within the group |
| 84 | +- +dimensions+: An integral value representing the dimensionality of the group |
| 85 | + |
| 86 | +[source, c++] |
| 87 | +---- |
| 88 | +template <int Dimensions = 1> |
| 89 | +class group |
| 90 | +{ |
| 91 | +public: |
| 92 | + using id_type = id<Dimensions>; |
| 93 | + using range_type = range<Dimensions>; |
| 94 | + using linear_id_type = size_t; |
| 95 | + static constexpr int dimensions = Dimensions; |
| 96 | +}; |
| 97 | +---- |
| 98 | + |
| 99 | +=== Data Types |
| 100 | + |
| 101 | +All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+. |
| 102 | + |
| 103 | +Functions with arguments of type +vec<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+. |
| 104 | + |
| 105 | +=== Function Objects |
| 106 | + |
| 107 | +A number of function objects are provided in the +cl::sycl::intel+ namespace. These function objects are used for all interfaces requiring an operator to be specified. All function objects obey C++ conversion and promotion rules. |
| 108 | + |
| 109 | +Transparent function objects are provided if using a {cpp}14 compiler -- the parameter types and return type for transparent function objects will be deduced if +T+ is not specified. |
| 110 | + |
| 111 | +The following function objects alias objects in the +<functional>+ header from the {cpp} standard library: |
| 112 | + |
| 113 | +- +cl::sycl::intel::plus+ |
| 114 | +- +cl::sycl::intel::multiplies+ |
| 115 | +- +cl::sycl::intel::bit_and+ |
| 116 | +- +cl::sycl::intel::bit_or+ |
| 117 | +- +cl::sycl::intel::bit_xor+ |
| 118 | +- +cl::sycl::intel::logical_and+ |
| 119 | +- +cl::sycl::intel::logical_or+ |
| 120 | + |
| 121 | +New function objects without {cpp} standard library equivalents are defined in the table below: |
| 122 | + |
| 123 | +|=== |
| 124 | +|Function Object|Description |
| 125 | + |
| 126 | +|+template <typename T=void> struct minimum;+ |
| 127 | +|+T operator(const T&, const T&) const+ applies +std::less+ to its arguments, in the same order, then returns the lesser argument unchanged. |
| 128 | + |
| 129 | +|+template <typename T=void> struct maximum;+ |
| 130 | +|+T operator(const T&, const T&) const+ applies +std::greater+ to its arguments, in the same order, then returns the greater argument unchanged. |
| 131 | +|=== |
| 132 | + |
| 133 | +Function objects supported by the group algorithms library can be identified using the +cl::sycl::intel::is_native_function_object+ and +cl::sycl::intel::is_native_function_object_v+ traits classes. |
| 134 | + |
| 135 | +=== Functions |
| 136 | + |
| 137 | +The group algorithms library is based on the algorithms library described in Section 28 of the {cpp}17 standard. The syntax and restrictions are aligned, with two notable differences: the first argument to each function is a group of work-items, in place of an execution policy; and pointers are accepted in place of iterators in order to guarantee that address space information is visible to the compiler. |
| 138 | + |
| 139 | +Group algorithms are performed collaboratively by the work-items in a group. All functions therefore act as synchronization points and must be encountered in converged control flow by all work-items in the group -- if one work-item in the group reaches the function, then all work-items in the group must reach the function. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions. |
| 140 | + |
| 141 | +Many functions provide at least two overloads: one operating directly on data produced by the work-items in the group, and another operating on a range of data in memory specified by a pair of pointers. If the pointers passed to such a group function are not the same for all work-items in the group, their behavior is undefined. How the elements of a range are processed by the work-items in a group is undefined. |
| 142 | + |
| 143 | +Using functions from the group algorithms library inside of a kernel may introduce additional limits on the resources available to user code inside the same kernel (e.g. private memory, work-group local memory). The behavior of these limits is implementation-defined, but must be reflected by calls to kernel querying functions such as +kernel::get_work_group_info+. |
| 144 | + |
| 145 | +It is undefined behavior for any of these functions to be invoked within a +parallel_for_work_group+ or +parallel_for_work_item+ context, but this restriction may be lifted in a future version of the proposal. |
| 146 | + |
| 147 | +==== Vote |
| 148 | + |
| 149 | +|=== |
| 150 | +|Function|Description |
| 151 | + |
| 152 | +|+template <typename Group> bool any_of(Group g, bool pred);+ |
| 153 | +|Return +true+ if _pred_ is +true+ for any work-item in the group. |
| 154 | + |
| 155 | +|+template <typename Group> bool all_of(Group g, bool pred);+ |
| 156 | +|Return +true+ if _pred_ is +true+ for all work-items in the group. |
| 157 | + |
| 158 | +|+template <typename Group> bool none_of(Group g, bool pred);+ |
| 159 | +|Return +true+ if _pred_ is +true+ for no work-items in the group. |
| 160 | + |
| 161 | +|+template <typename Group, typename T, class Predicate> bool any_of(Group g, T x, Predicate pred);+ |
| 162 | +|Return +true+ if _pred(x)_ is +true+ for any work-item in the group. _pred_ must be the same for all work-items in the group. |
| 163 | + |
| 164 | +|+template <typename Group, typename T, class Predicate> bool all_of(Group g, T x, Predicate pred);+ |
| 165 | +|Return +true+ if _pred(x)_ is +true+ for all work-items in the group. _pred_ must be the same for all work-items in the group. |
| 166 | + |
| 167 | +|+template <typename Group, typename T, class Predicate> bool none_of(Group g, T x, Predicate pred);+ |
| 168 | +|Return +true+ if _pred(x)_ is +true+ for no work-items in the group. _pred_ must be the same for all work-items in the group. |
| 169 | +|=== |
| 170 | + |
| 171 | +|=== |
| 172 | +|Function|Description |
| 173 | + |
| 174 | +|+template <typename Group, typename Ptr, class Predicate> bool any_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 175 | +|Return +true+ if _pred_ returns +true+ for any element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 176 | + |
| 177 | +|+template <typename Group, typename Ptr, class Predicate> bool all_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 178 | +|Return +true+ if _pred_ returns +true+ for all elements in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 179 | + |
| 180 | +|+template <typename Group, typename Ptr, class Predicate> bool none_of(Group g, Ptr first, Ptr last, Predicate pred);+ |
| 181 | +|Return +true+ if _pred_ returns +true+ for no element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. |
| 182 | +|=== |
| 183 | + |
| 184 | +==== Collectives |
| 185 | + |
| 186 | +In this section, the meaning of "exclusive scan" and "inclusive scan" are as defined in Sections 29.8.7 and 29.8.8 of the {cpp}17 specification, respectively. |
| 187 | + |
| 188 | +|=== |
| 189 | +|Function|Description |
| 190 | + |
| 191 | +|+template <typename Group, typename T> T broadcast(Group g, T x);+ |
| 192 | +|Broadcast the value of _x_ from the work-item with the lowest id to all work-items within the group. |
| 193 | + |
| 194 | +|+template <typename Group, typename T> T broadcast(Group g, T x, Group::linear_id_type local_linear_id);+ |
| 195 | +|Broadcast the value of _x_ from the work-item with the specified linear id to all work-items within the group. The value of _local_linear_id_ must be the same for all work-items in the group. |
| 196 | + |
| 197 | +|+template <typename Group, typename T> T broadcast(Group g, T x, Group::id_type local_id);+ |
| 198 | +|Broadcast the value of _x_ from the work-item with the specified id to all work-items within the group. The value of _local_id_ must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group. |
| 199 | + |
| 200 | +|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+ |
| 201 | +|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. |
| 202 | + |
| 203 | +|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+ |
| 204 | +|Combine the values of _x_ from all work-items in the group using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. |
| 205 | + |
| 206 | +|+template <typename Group, typename T, class BinaryOperation> T exclusive_scan(Group g, T x, BinaryOperation binary_op);+ |
| 207 | +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work-items in the group and the identity value of _binary_op_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. |
| 208 | + |
| 209 | +|+template <typename Group, typename V, typename T, class BinaryOperation> T exclusive_scan(Group g, V x, T init, BinaryOperation binary_op);+ |
| 210 | +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _init_ and _binary_op_ must be the same for all work-items in the group. |
| 211 | + |
| 212 | +|+template <typename Group, typename T, class BinaryOperation> T inclusive_scan(Group g, T x, BinaryOperation binary_op);+ |
| 213 | +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. |
| 214 | + |
| 215 | +|+template <typename Group, typename V, class BinaryOperation, typename T> T inclusive_scan(Group g, V x, BinaryOperation binary_op, T init);+ |
| 216 | +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ and _init_ must be the same for all work-items in the group. |
| 217 | +|=== |
| 218 | + |
| 219 | +|=== |
| 220 | +|Function|Description |
| 221 | + |
| 222 | +|+template <typename Group, typename Ptr, class BinaryOperation> Ptr::element_type reduce(Group g, Ptr first, Ptr last, BinaryOperation binary_op);+ |
| 223 | +|Combine the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_ and _binary_op_ must be the same for all work-items in the group. |
| 224 | + |
| 225 | +|+template <typename Group, typename Ptr, typename T, class BinaryOperation> T reduce(Group g, Ptr first, Ptr last, T init, BinaryOperation binary_op);+ |
| 226 | +|Combine the values in the range [_first_, _last_) using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_, _init__ and _binary_op_ must be the same for all work-items in the group. |
| 227 | + |
| 228 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ |
| 229 | +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and the identity value of _binary_op_. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. |
| 230 | + |
| 231 | +|+template <typename Group, typename InPtr, typename OutPtr, typename T, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, T init, BinaryOperation binary_op);+ |
| 232 | +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _init_ and _binary_op_ must be the same for all work-items in the group. |
| 233 | + |
| 234 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ |
| 235 | +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. |
| 236 | + |
| 237 | +|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation, typename T> OutrPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op, T init);+ |
| 238 | +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group. |
| 239 | +|=== |
| 240 | + |
| 241 | +== Issues |
| 242 | + |
| 243 | +None. |
| 244 | + |
| 245 | +//. asd |
| 246 | +//+ |
| 247 | +//-- |
| 248 | +//*RESOLUTION*: Not resolved. |
| 249 | +//-- |
| 250 | +
|
| 251 | +== Revision History |
| 252 | +
|
| 253 | +[cols="5,15,15,70"] |
| 254 | +[grid="rows"] |
| 255 | +[options="header"] |
| 256 | +|======================================== |
| 257 | +|Rev|Date|Author|Changes |
| 258 | +|1|2020-01-30|John Pennycook|*Initial public working draft* |
| 259 | +|======================================== |
| 260 | +
|
| 261 | +//************************************************************************ |
| 262 | +//Other formatting suggestions: |
| 263 | +// |
| 264 | +//* Use *bold* text for host APIs, or [source] syntax highlighting. |
| 265 | +//* Use +mono+ text for device APIs, or [source] syntax highlighting. |
| 266 | +//* Use +mono+ text for extension names, types, or enum values. |
| 267 | +//* Use _italics_ for parameters. |
| 268 | +//************************************************************************ |
0 commit comments