diff --git a/sycl/doc/extensions/GroupAlgorithms/README.md b/sycl/doc/extensions/GroupAlgorithms/README.md new file mode 100644 index 0000000000000..19ebb9c4a92ad --- /dev/null +++ b/sycl/doc/extensions/GroupAlgorithms/README.md @@ -0,0 +1,3 @@ +# SYCL_INTEL_group_algorithms + +A library of group functions, including common parallel algorithms such as reductions and scans. diff --git a/sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc b/sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc new file mode 100755 index 0000000000000..8e0f68b001e39 --- /dev/null +++ b/sycl/doc/extensions/GroupAlgorithms/SYCL_INTEL_group_algorithms.asciidoc @@ -0,0 +1,270 @@ += SYCL_INTEL_group_algorithms +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en + +:blank: pass:[ +] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + +== Introduction +IMPORTANT: This specification is a draft. + +NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. + +NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons. + +This document describes an extension which introduces a library of group functions, including common parallel algorithms such as reductions and scans. + +== Name Strings + ++SYCL_INTEL_group_algorithms+ + +== Notice + +Copyright (c) 2020 Intel Corporation. All rights reserved. + +== Status + +Working Draft + +This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension. + +Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products. + +== Version + +Built On: {docdate} + +Revision: 1 + +== Contact +John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com) + +== Dependencies + +This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6. + +== Overview + +This extension introduces a library of group algorithms, providing a simple way for developers to apply common parallel algorithms across data held by different work-items in the same group, or to use the work-items of a group to apply common parallel algorithms across data stored in memory. + +The extension introduces the following functions: + +- +any_of+ +- +all_of+ +- +none_of+ +- +broadcast+ +- +reduce+ +- +exclusive_scan+ +- +inclusive_scan+ + +=== Alignment with OpenCL vs C++ + +Where a feature is common to both OpenCL and {cpp}, this proposal opts for {cpp}-like naming: + +- Collective operators are named as in ++ (e.g. +plus+ instead of +sum+) and to avoid clashes with names in ++ (e.g. +minimum+ instead of +min+). +- Algorithms are named as in ++ (e.g. +inclusive_scan+ instead of +scan_inclusive+). + +=== Towards a Generic Group Interface + +This extension adds a number of static members to the +group+ class to simplify the interfaces of the functions in this library: + +- +id_type+: The type used to represent work-item IDs within the group +- +range_type+: The type used to represent the range of the group +- +linear_id_type+: The type used to represent linear work-item IDs within the group +- +dimensions+: An integral value representing the dimensionality of the group + +[source, c++] +---- +template +class group +{ +public: + using id_type = id; + using range_type = range; + using linear_id_type = size_t; + static constexpr int dimensions = Dimensions; +}; +---- + +=== Data Types + +All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+. + +Functions with arguments of type +vec+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+. + +=== Function Objects + +A number of function objects are provided in the +cl::sycl::intel+ namespace. These function objects are used for all interfaces requiring an operator to be specified. All function objects obey C++ conversion and promotion rules. + +Transparent function objects are provided if using a {cpp}14 compiler -- the parameter types and return type for transparent function objects will be deduced if +T+ is not specified. + +The following function objects alias objects in the ++ header from the {cpp} standard library: + +- +cl::sycl::intel::plus+ +- +cl::sycl::intel::multiplies+ +- +cl::sycl::intel::bit_and+ +- +cl::sycl::intel::bit_or+ +- +cl::sycl::intel::bit_xor+ +- +cl::sycl::intel::logical_and+ +- +cl::sycl::intel::logical_or+ + +New function objects without {cpp} standard library equivalents are defined in the table below: + +|=== +|Function Object|Description + +|+template struct minimum;+ +|+T operator(const T&, const T&) const+ applies +std::less+ to its arguments, in the same order, then returns the lesser argument unchanged. + +|+template struct maximum;+ +|+T operator(const T&, const T&) const+ applies +std::greater+ to its arguments, in the same order, then returns the greater argument unchanged. +|=== + +Function objects supported by the group algorithms library can be identified using the +cl::sycl::intel::is_native_function_object+ and +cl::sycl::intel::is_native_function_object_v+ traits classes. + +=== Functions + +The group algorithms library is based on the algorithms library described in Section 28 of the {cpp}17 standard. The syntax and restrictions are aligned, with two notable differences: the first argument to each function is a group of work-items, in place of an execution policy; and pointers are accepted in place of iterators in order to guarantee that address space information is visible to the compiler. + +Group algorithms are performed collaboratively by the work-items in a group. All functions therefore act as synchronization points and must be encountered in converged control flow by all work-items in the group -- if one work-item in the group reaches the function, then all work-items in the group must reach the function. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions. + +Many functions provide at least two overloads: one operating directly on data produced by the work-items in the group, and another operating on a range of data in memory specified by a pair of pointers. If the pointers passed to such a group function are not the same for all work-items in the group, their behavior is undefined. How the elements of a range are processed by the work-items in a group is undefined. + +Using functions from the group algorithms library inside of a kernel may introduce additional limits on the resources available to user code inside the same kernel (e.g. private memory, work-group local memory). The behavior of these limits is implementation-defined, but must be reflected by calls to kernel querying functions such as +kernel::get_work_group_info+. + +It is undefined behavior for any of these functions to be invoked within a +parallel_for_work_group+ or +parallel_for_work_item+ context, but this restriction may be lifted in a future version of the proposal. + +==== Vote + +|=== +|Function|Description + +|+template bool any_of(Group g, bool pred);+ +|Return +true+ if _pred_ is +true+ for any work-item in the group. + +|+template bool all_of(Group g, bool pred);+ +|Return +true+ if _pred_ is +true+ for all work-items in the group. + +|+template bool none_of(Group g, bool pred);+ +|Return +true+ if _pred_ is +true+ for no work-items in the group. + +|+template bool any_of(Group g, T x, Predicate pred);+ +|Return +true+ if _pred(x)_ is +true+ for any work-item in the group. _pred_ must be the same for all work-items in the group. + +|+template bool all_of(Group g, T x, Predicate pred);+ +|Return +true+ if _pred(x)_ is +true+ for all work-items in the group. _pred_ must be the same for all work-items in the group. + +|+template bool none_of(Group g, T x, Predicate pred);+ +|Return +true+ if _pred(x)_ is +true+ for no work-items in the group. _pred_ must be the same for all work-items in the group. +|=== + +|=== +|Function|Description + +|+template bool any_of(Group g, Ptr first, Ptr last, Predicate pred);+ +|Return +true+ if _pred_ returns +true+ for any element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. + +|+template bool all_of(Group g, Ptr first, Ptr last, Predicate pred);+ +|Return +true+ if _pred_ returns +true+ for all elements in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. + +|+template bool none_of(Group g, Ptr first, Ptr last, Predicate pred);+ +|Return +true+ if _pred_ returns +true+ for no element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group. +|=== + +==== Collectives + +In this section, the meaning of "exclusive scan" and "inclusive scan" are as defined in Sections 29.8.7 and 29.8.8 of the {cpp}17 specification, respectively. + +The return types of the collective functions in {cpp}17 are not deduced from the return type of the specified binary operator, but from either the type of the input values or the type of the initialization value (if one is provided). This is error-prone and can lead to unexpected behavior (e.g. specifying an initial value of `0` instead of `0.0f` for a floating-point reduction will cause the results to be accumulated in an integer). To minimize the chances of encountering such errors, the collective functions in the group algorithms library place additional restrictions on type combinations that can be deduced. + +|=== +|Function|Description + +|+template T broadcast(Group g, T x);+ +|Broadcast the value of _x_ from the work-item with the lowest id to all work-items within the group. + +|+template T broadcast(Group g, T x, Group::linear_id_type local_linear_id);+ +|Broadcast the value of _x_ from the work-item with the specified linear id to all work-items within the group. The value of _local_linear_id_ must be the same for all work-items in the group. + +|+template T broadcast(Group g, T x, Group::id_type local_id);+ +|Broadcast the value of _x_ from the work-item with the specified id to all work-items within the group. The value of _local_id_ must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group. + +|+template T reduce(Group g, T x, BinaryOperation binary_op);+ +|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. + +|+template T reduce(Group g, V x, T init, BinaryOperation binary_op);+ +|Combine the values of _x_ from all work-items in the group using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. + +|+template T exclusive_scan(Group g, T x, BinaryOperation binary_op);+ +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work-items in the group and the identity value of _binary_op_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. + +|+template T exclusive_scan(Group g, V x, T init, BinaryOperation binary_op);+ +|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _init_ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. + +|+template T inclusive_scan(Group g, T x, BinaryOperation binary_op);+ +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group. _binary_op(x, x)_ must return a value of type _T_. + +|+template T inclusive_scan(Group g, V x, BinaryOperation binary_op, T init);+ +|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ and _init_ must be the same for all work-items in the group. _binary_op(init, x)_ must return a value of type _T_. +|=== + +|=== +|Function|Description + +|+template Ptr::element_type reduce(Group g, Ptr first, Ptr last, BinaryOperation binary_op);+ +|Combine the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _Ptr::element_type_. + +|+template T reduce(Group g, Ptr first, Ptr last, T init, BinaryOperation binary_op);+ +|Combine the values in the range [_first_, _last_) using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_, _init__ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. + +|+template OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and the identity value of _binary_op_. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _OutPtr::element_type_. + +|+template OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, T init, BinaryOperation binary_op);+ +|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _init_ and _binary_op_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. + +|+template OutPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+ +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group. _binary_op(*first, *first)_ must return a value of type _OutPtr::element_type_. + +|+template OutrPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op, T init);+ +|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group. _binary_op(init, *first)_ must return a value of type _T_. +|=== + +== Issues + +None. + +//. asd +//+ +//-- +//*RESOLUTION*: Not resolved. +//-- + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2020-01-30|John Pennycook|*Initial public working draft* +|======================================== + +//************************************************************************ +//Other formatting suggestions: +// +//* Use *bold* text for host APIs, or [source] syntax highlighting. +//* Use +mono+ text for device APIs, or [source] syntax highlighting. +//* Use +mono+ text for extension names, types, or enum values. +//* Use _italics_ for parameters. +//************************************************************************ diff --git a/sycl/doc/extensions/GroupCollectives/GroupCollectives.md b/sycl/doc/extensions/GroupCollectives/GroupCollectives.md deleted file mode 100644 index e6a27ad2056ef..0000000000000 --- a/sycl/doc/extensions/GroupCollectives/GroupCollectives.md +++ /dev/null @@ -1,56 +0,0 @@ -# SYCL(TM) Proposals: Group Collectives for NDRange Parallelism - -**IMPORTANT**: This specification is a draft. - -**NOTE**: Khronos(R) is a registered trademark and SYCL(TM) is a trademark of the Khronos Group, Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos. - -This proposal exposes the work-group functions from OpenCL 2.0 (any, all, broadcast, reductions and scans) to the NDRange variant of `parallel_for`, and does not address hierarchical parallelism. - -The new functions are added to the `cl::sycl::group` class, and guarded by the `__SYCL_INTEL_GROUP_COLLECTIVES__` macro. - -## Alignment with OpenCL vs C++ - -Where a feature is common to both OpenCL and C++, this proposal opts for C++-like naming: -- Collective operators are named as in `` (e.g. `plus` instead of `sum`) and to avoid clashes with names in `` (e.g. `minimum` instead of `min`). -- Scan operations are named as in `` (e.g. `inclusive_scan` instead of `scan_inclusive`). - -## Data Types - -All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL `vec` class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: `bool`, `char`, `signed char`, `unsigned char`, `short int`, `unsigned short int`, `int`, `unsigned int`, `long int`, `unsigned long int`, `long long int`, `unsigned long long int`, `size_t`, `float`, `double`, `half`. - -Functions with arguments of type `vec` are applied component-wise: they are semantically equivalent to `N` calls to a scalar function of type `T`. - -## Function Objects - -A number of function objects are provided in the `cl::sycl::intel` namespace that are equivalent to those found in the `` header from the C++ standard library. These function objects are used for all interfaces requiring an operator to be specified. - -The parameter types and return type for all function objects will be deduced if `T` is not specified. - -|Function object|Description| -|----------------|-----------| -|`template struct plus;`|`T operator(const T&, const T&) const` calls `operator+` on its arguments.| -|`template struct minimum;`|`T operator(const T&, const T&) const` applies `std::less` to its arguments, in the same order, then returns the lesser argument unchanged.| -|`template struct maximum;`|`T operator(const T&, const T&) const` applies `std::greater` to its arguments, in the same order, then returns the greater argument unchanged.| - -# Functions - -The member functions of the `group` class described in this section act as a work-group barrier, and it is undefined behavior for these functions to be invoked within a `parallel_for_work_group` or `parallel_for_work_item` context. - -## Vote / Ballot - -|Member functions|Description| -|----------------|-----------| -| `bool any(bool predicate) const` | Return `true` if `predicate` evaluates to `true` for any work-item in the work-group.| -| `bool all(bool predicate) const` | Return `true` if `predicate` evaluates to `true` for all work-items in the work-group.| - -## Collectives - -|Member functions|Description| -|----------------|-----------| -|`template T broadcast(T x, id<1> local_id) const` | Broadcast the value of `x` from the work-item with the specified id to all work-items within the work-group. The value of `local_id` must be the same for all work-items in the work-group.| -|`template T reduce(T x, BinaryOp binary_op) const;`|Combine the values of `x` from all work-items in the work-group using the specified operator, which must be one of `plus`, `minimum` or `maximum`.| -|`template T reduce(T x, T init, BinaryOp binary_op) const;`|Combine the values of `x` from all work-items in the work-group using an initial value of `init` and the specified operator, which must be one of `plus`, `minimum` or `maximum`.| -|`template T exclusive_scan(T x, BinaryOp binary_op) const;`|Perform an exclusive scan over the values of `x` from all work-items in the work-group using the specified operator, which must be one of: `plus`, `minimum` or `maximum`. The value returned on work-item `i` is the exclusive scan of the first `i` work items in the work-group and the `init` value. For multi-dimensional work-groups, the order of work-items in the group is determined by their linear id. The initial value is the identity value of the operator.| -|`template T exclusive_scan(T x, T init, BinaryOp binary_op) const;`|Perform an exclusive scan over the values of `x` from all work-items in the work-group using the specified operator, which must be one of: `plus`, `minimum` or `maximum`. The value returned on work-item `i` is the exclusive scan of the first `i` work items in the work-group and the `init` value. For multi-dimensional work-groups, the order of work-items in the group is determined by their linear id. The initial value is specified by `init`.| -|`template T inclusive_scan(T x, BinaryOp binary_op) const;`|Perform an inclusive scan over the values of `x` from all work-items in the work-group using the specified operator, which must be one of: `plus`, `minimum` or `maximum`. The value returned on work-item `i` is the inclusive scan of the first `i` work items in the work-group. For multi-dimensional work-groups, the order of work-items in the group is determined by their linear id.| -|`template T inclusive_scan(T x, BinaryOp binary_op, T init) const;`|Perform an inclusive scan over the values of `x` from all work-items in the work-group using the specified operator, which must be one of: `plus`, `minimum` or `maximum`. The value returned on work-item `i` is the inclusive scan of the first `i` work items in the work-group and the `init` value. For multi-dimensional work-groups, the order of work-items in the group is determined by their linear id.|