Skip to content

Commit 4ee810e

Browse files
committed
[SYCL][Doc] Add GroupAlgorithms extension
Replaces GroupCollectives extension with a library of free functions: - any_of - all_of - none_of - broadcast - reduce - exclusive_scan - inclusive_scan Signed-off-by: John Pennycook <john.pennycook@intel.com>
1 parent 50e7abd commit 4ee810e

File tree

3 files changed

+271
-56
lines changed

3 files changed

+271
-56
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# SYCL_INTEL_group_algorithms
2+
3+
A library of group functions, including common parallel algorithms such as reductions and scans.
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
= SYCL_INTEL_group_algorithms
2+
:source-highlighter: coderay
3+
:coderay-linenums-mode: table
4+
5+
// This section needs to be after the document title.
6+
:doctype: book
7+
:toc2:
8+
:toc: left
9+
:encoding: utf-8
10+
:lang: en
11+
12+
:blank: pass:[ +]
13+
14+
// Set the default source code type in this document to C++,
15+
// for syntax highlighting purposes. This is needed because
16+
// docbook uses c++ and html5 uses cpp.
17+
:language: {basebackend@docbook:c++:cpp}
18+
19+
== Introduction
20+
IMPORTANT: This specification is a draft.
21+
22+
NOTE: Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by permission by Khronos.
23+
24+
NOTE: This document is better viewed when rendered as html with asciidoctor. GitHub does not render image icons.
25+
26+
This document describes an extension which introduces a library of group functions, including common parallel algorithms such as reductions and scans.
27+
28+
== Name Strings
29+
30+
+SYCL_INTEL_group_algorithms+
31+
32+
== Notice
33+
34+
Copyright (c) 2020 Intel Corporation. All rights reserved.
35+
36+
== Status
37+
38+
Working Draft
39+
40+
This is a preview extension specification, intended to provide early access to a feature for review and community feedback. When the feature matures, this specification may be released as a formal extension.
41+
42+
Because the interfaces defined by this specification are not final and are subject to change they are not intended to be used by shipping software products.
43+
44+
== Version
45+
46+
Built On: {docdate} +
47+
Revision: 1
48+
49+
== Contact
50+
John Pennycook, Intel (john 'dot' pennycook 'at' intel 'dot' com)
51+
52+
== Dependencies
53+
54+
This extension is written against the SYCL 1.2.1 specification, Revision v1.2.1-6.
55+
56+
== Overview
57+
58+
This extension introduces a library of group algorithms, providing a simple way for developers to apply common parallel algorithms across data held by different work-items in the same group, or to use the work-items of a group to apply common parallel algorithms across data stored in memory.
59+
60+
The extension introduces the following functions:
61+
62+
- +any_of+
63+
- +all_of+
64+
- +none_of+
65+
- +broadcast+
66+
- +reduce+
67+
- +exclusive_scan+
68+
- +inclusive_scan+
69+
70+
=== Alignment with OpenCL vs C++
71+
72+
Where a feature is common to both OpenCL and {cpp}, this proposal opts for {cpp}-like naming:
73+
74+
- Collective operators are named as in +<functional>+ (e.g. +plus+ instead of +sum+) and to avoid clashes with names in +<algorithm>+ (e.g. +minimum+ instead of +min+).
75+
- Algorithms are named as in +<algorithm>+ (e.g. +inclusive_scan+ instead of +scan_inclusive+).
76+
77+
=== Towards a Generic Group Interface
78+
79+
This extension adds a number of static members to the +group+ class to simplify the interfaces of the functions in this library:
80+
81+
- +id_type+: The type used to represent work-item IDs within the group
82+
- +range_type+: The type used to represent the range of the group
83+
- +linear_id_type+: The type used to represent linear work-item IDs within the group
84+
- +dimensions+: An integral value representing the dimensionality of the group
85+
86+
[source, c++]
87+
----
88+
template <int Dimensions = 1>
89+
class group
90+
{
91+
public:
92+
using id_type = id<Dimensions>;
93+
using range_type = range<Dimensions>;
94+
using linear_id_type = size_t;
95+
static constexpr int dimensions = Dimensions;
96+
};
97+
----
98+
99+
=== Data Types
100+
101+
All functions are supported for the fundamental scalar types supported by SYCL and instances of the SYCL +vec+ class. The fundamental scalar types (as defined in Section 6.5 of the SYCL 1.2.1 specification) are: +bool+, +char+, +signed char+, +unsigned char+, +short int+, +unsigned short int+, +int+, +unsigned int+, +long int+, +unsigned long int+, +long long int+, +unsigned long long int+, +size_t+, +float+, +double+, +half+.
102+
103+
Functions with arguments of type +vec<T,N>+ are applied component-wise: they are semantically equivalent to N calls to a scalar function of type +T+.
104+
105+
=== Function Objects
106+
107+
A number of function objects are provided in the +cl::sycl::intel+ namespace. These function objects are used for all interfaces requiring an operator to be specified. All function objects obey C++ conversion and promotion rules.
108+
109+
Transparent function objects are provided if using a {cpp}14 compiler -- the parameter types and return type for transparent function objects will be deduced if +T+ is not specified.
110+
111+
The following function objects alias objects in the +<functional>+ header from the {cpp} standard library:
112+
113+
- +cl::sycl::intel::plus+
114+
- +cl::sycl::intel::multiplies+
115+
- +cl::sycl::intel::bit_and+
116+
- +cl::sycl::intel::bit_or+
117+
- +cl::sycl::intel::bit_xor+
118+
- +cl::sycl::intel::logical_and+
119+
- +cl::sycl::intel::logical_or+
120+
121+
New function objects without {cpp} standard library equivalents are defined in the table below:
122+
123+
|===
124+
|Function Object|Description
125+
126+
|+template <typename T=void> struct minimum;+
127+
|+T operator(const T&, const T&) const+ applies +std::less+ to its arguments, in the same order, then returns the lesser argument unchanged.
128+
129+
|+template <typename T=void> struct maximum;+
130+
|+T operator(const T&, const T&) const+ applies +std::greater+ to its arguments, in the same order, then returns the greater argument unchanged.
131+
|===
132+
133+
Function objects supported by the group algorithms library can be identified using the +cl::sycl::intel::is_native_function_object+ and +cl::sycl::intel::is_native_function_object_v+ traits classes.
134+
135+
=== Functions
136+
137+
The group algorithms library is based on the algorithms library described in Section 28 of the {cpp}17 standard. The syntax and restrictions are aligned, with two notable differences: the first argument to each function is a group of work-items, in place of an execution policy; and pointers are accepted in place of iterators in order to guarantee that address space information is visible to the compiler.
138+
139+
Group algorithms are performed collaboratively by the work-items in a group. All functions therefore act as synchronization points and must be encountered in converged control flow by all work-items in the group -- if one work-item in the group reaches the function, then all work-items in the group must reach the function. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the operation that is being performed. Any such restrictions on the arguments passed to a function are defined within the descriptions of those functions.
140+
141+
Many functions provide at least two overloads: one operating directly on data produced by the work-items in the group, and another operating on a range of data in memory specified by a pair of pointers. If the pointers passed to such a group function are not the same for all work-items in the group, their behavior is undefined. How the elements of a range are processed by the work-items in a group is undefined.
142+
143+
Using functions from the group algorithms library inside of a kernel may introduce additional limits on the resources available to user code inside the same kernel (e.g. private memory, work-group local memory). The behavior of these limits is implementation-defined, but must be reflected by calls to kernel querying functions such as +kernel::get_work_group_info+.
144+
145+
It is undefined behavior for any of these functions to be invoked within a +parallel_for_work_group+ or +parallel_for_work_item+ context, but this restriction may be lifted in a future version of the proposal.
146+
147+
==== Vote
148+
149+
|===
150+
|Function|Description
151+
152+
|+template <typename Group> bool any_of(Group g, bool pred);+
153+
|Return +true+ if _pred_ is +true+ for any work-item in the group.
154+
155+
|+template <typename Group> bool all_of(Group g, bool pred);+
156+
|Return +true+ if _pred_ is +true+ for all work-items in the group.
157+
158+
|+template <typename Group> bool none_of(Group g, bool pred);+
159+
|Return +true+ if _pred_ is +true+ for no work-items in the group.
160+
161+
|+template <typename Group, typename T, class Predicate> bool any_of(Group g, T x, Predicate pred);+
162+
|Return +true+ if _pred(x)_ is +true+ for any work-item in the group. _pred_ must be the same for all work-items in the group.
163+
164+
|+template <typename Group, typename T, class Predicate> bool all_of(Group g, T x, Predicate pred);+
165+
|Return +true+ if _pred(x)_ is +true+ for all work-items in the group. _pred_ must be the same for all work-items in the group.
166+
167+
|+template <typename Group, typename T, class Predicate> bool none_of(Group g, T x, Predicate pred);+
168+
|Return +true+ if _pred(x)_ is +true+ for no work-items in the group. _pred_ must be the same for all work-items in the group.
169+
|===
170+
171+
|===
172+
|Function|Description
173+
174+
|+template <typename Group, typename Ptr, class Predicate> bool any_of(Group g, Ptr first, Ptr last, Predicate pred);+
175+
|Return +true+ if _pred_ returns +true+ for any element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.
176+
177+
|+template <typename Group, typename Ptr, class Predicate> bool all_of(Group g, Ptr first, Ptr last, Predicate pred);+
178+
|Return +true+ if _pred_ returns +true+ for all elements in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.
179+
180+
|+template <typename Group, typename Ptr, class Predicate> bool none_of(Group g, Ptr first, Ptr last, Predicate pred);+
181+
|Return +true+ if _pred_ returns +true+ for no element in the range [_first_, _last_). _first_, _last_ and _pred_ must be the same for all work-items in the group.
182+
|===
183+
184+
==== Collectives
185+
186+
In this section, the meaning of "exclusive scan" and "inclusive scan" are as defined in Sections 29.8.7 and 29.8.8 of the {cpp}17 specification, respectively.
187+
188+
|===
189+
|Function|Description
190+
191+
|+template <typename Group, typename T> T broadcast(Group g, T x);+
192+
|Broadcast the value of _x_ from the work-item with the lowest id to all work-items within the group.
193+
194+
|+template <typename Group, typename T> T broadcast(Group g, T x, Group::linear_id_type local_linear_id);+
195+
|Broadcast the value of _x_ from the work-item with the specified linear id to all work-items within the group. The value of _local_linear_id_ must be the same for all work-items in the group.
196+
197+
|+template <typename Group, typename T> T broadcast(Group g, T x, Group::id_type local_id);+
198+
|Broadcast the value of _x_ from the work-item with the specified id to all work-items within the group. The value of _local_id_ must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group.
199+
200+
|+template <typename Group, typename T, class BinaryOperation> T reduce(Group g, T x, BinaryOperation binary_op);+
201+
|Combine the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group.
202+
203+
|+template <typename Group, typename V, typename T, class BinaryOperation> T reduce(Group g, V x, T init, BinaryOperation binary_op);+
204+
|Combine the values of _x_ from all work-items in the group using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _binary_op_ must be the same for all work-items in the group.
205+
206+
|+template <typename Group, typename T, class BinaryOperation> T exclusive_scan(Group g, T x, BinaryOperation binary_op);+
207+
|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work-items in the group and the identity value of _binary_op_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group.
208+
209+
|+template <typename Group, typename V, typename T, class BinaryOperation> T exclusive_scan(Group g, V x, T init, BinaryOperation binary_op);+
210+
|Perform an exclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the exclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _init_ and _binary_op_ must be the same for all work-items in the group.
211+
212+
|+template <typename Group, typename T, class BinaryOperation> T inclusive_scan(Group g, T x, BinaryOperation binary_op);+
213+
|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ must be the same for all work-items in the group.
214+
215+
|+template <typename Group, typename V, class BinaryOperation, typename T> T inclusive_scan(Group g, V x, BinaryOperation binary_op, T init);+
216+
|Perform an inclusive scan over the values of _x_ from all work-items in the group using the operator _binary_op_, which must be one of the group algorithms library function objects. The value returned on work-item +i+ is the inclusive scan of the first +i+ work items in the group and an initial value specified by _init_. For multi-dimensional groups, the order of work-items in the group is determined by their linear id. _binary_op_ and _init_ must be the same for all work-items in the group.
217+
|===
218+
219+
|===
220+
|Function|Description
221+
222+
|+template <typename Group, typename Ptr, class BinaryOperation> Ptr::element_type reduce(Group g, Ptr first, Ptr last, BinaryOperation binary_op);+
223+
|Combine the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_ and _binary_op_ must be the same for all work-items in the group.
224+
225+
|+template <typename Group, typename Ptr, typename T, class BinaryOperation> T reduce(Group g, Ptr first, Ptr last, T init, BinaryOperation binary_op);+
226+
|Combine the values in the range [_first_, _last_) using an initial value of _init_ and the operator _binary_op_, which must be one of the group algorithms library function objects. _first_, _last_, _init__ and _binary_op_ must be the same for all work-items in the group.
227+
228+
|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+
229+
|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and the identity value of _binary_op_. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group.
230+
231+
|+template <typename Group, typename InPtr, typename OutPtr, typename T, class BinaryOperation> OutPtr exclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, T init, BinaryOperation binary_op);+
232+
|Perform an exclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the exclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _init_ and _binary_op_ must be the same for all work-items in the group.
233+
234+
|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation> OutPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op);+
235+
|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range. Returns a pointer to the end of the output range. _first_, _last_, _result_ and _binary_op_ must be the same for all work-items in the group.
236+
237+
|+template <typename Group, typename InPtr, typename OutPtr, class BinaryOperation, typename T> OutrPtr inclusive_scan(Group g, InPtr first, InPtr last, OutPtr result, BinaryOperation binary_op, T init);+
238+
|Perform an inclusive scan over the values in the range [_first_, _last_) using the operator _binary_op_, which must be one of the group algorithms library function objects. The value written to +result + i+ is the inclusive scan of the first +i+ values in the range and an initial value specified by _init_. Returns a pointer to the end of the output range. _first_, _last_, _result_, _binary_op_ and _init_ must be the same for all work-items in the group.
239+
|===
240+
241+
== Issues
242+
243+
None.
244+
245+
//. asd
246+
//+
247+
//--
248+
//*RESOLUTION*: Not resolved.
249+
//--
250+
251+
== Revision History
252+
253+
[cols="5,15,15,70"]
254+
[grid="rows"]
255+
[options="header"]
256+
|========================================
257+
|Rev|Date|Author|Changes
258+
|1|2020-01-30|John Pennycook|*Initial public working draft*
259+
|========================================
260+
261+
//************************************************************************
262+
//Other formatting suggestions:
263+
//
264+
//* Use *bold* text for host APIs, or [source] syntax highlighting.
265+
//* Use +mono+ text for device APIs, or [source] syntax highlighting.
266+
//* Use +mono+ text for extension names, types, or enum values.
267+
//* Use _italics_ for parameters.
268+
//************************************************************************

0 commit comments

Comments
 (0)