Skip to content

Commit 33ee738

Browse files
authored
[SYCL][Doc] Add num_compute_units extension (#16293)
Defines a new num_compute_units device query to address several issues with the existing max_compute_units device query. --------- Signed-off-by: John Pennycook <[email protected]>
1 parent eb4b933 commit 33ee738

File tree

1 file changed

+173
-0
lines changed

1 file changed

+173
-0
lines changed
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
= sycl_ext_oneapi_num_compute_units
2+
3+
:source-highlighter: coderay
4+
:coderay-linenums-mode: table
5+
6+
// This section needs to be after the document title.
7+
:doctype: book
8+
:toc2:
9+
:toc: left
10+
:encoding: utf-8
11+
:lang: en
12+
:dpcpp: pass:[DPC++]
13+
:endnote: &#8212;{nbsp}end{nbsp}note
14+
15+
// Set the default source code type in this document to C++,
16+
// for syntax highlighting purposes. This is needed because
17+
// docbook uses c++ and html5 uses cpp.
18+
:language: {basebackend@docbook:c++:cpp}
19+
20+
21+
== Notice
22+
23+
[%hardbreaks]
24+
Copyright (C) 2024 Intel Corporation. All rights reserved.
25+
26+
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
27+
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
28+
permission by Khronos.
29+
30+
31+
== Contact
32+
33+
To report problems with this extension, please open a new issue at:
34+
35+
https://github.com/intel/llvm/issues
36+
37+
38+
== Dependencies
39+
40+
This extension is written against the SYCL 2020 revision 9 specification. All
41+
references below to the "core SYCL specification" or to section numbers in the
42+
SYCL specification refer to that revision.
43+
44+
45+
== Status
46+
47+
This is a proposed extension specification, intended to gather community
48+
feedback. Interfaces defined in this specification may not be implemented yet
49+
or may be in a preliminary state. The specification itself may also change in
50+
incompatible ways before it is finalized. *Shipping software products should
51+
not rely on APIs defined in this specification.*
52+
53+
54+
== Overview
55+
56+
SYCL 2020 allows developers to query the maximum number of compute units in a
57+
device via the `info::device::max_compute_units` query.
58+
There are two issues with this existing query: first, that it refers to a
59+
"maximum", despite the number of compute units being a fixed property of a
60+
device; and second, that the definition of "compute units" is vague. Different
61+
implementations and backends do not provide consistent interpretations of this
62+
query, which makes it difficult for developers to use the number of compute
63+
units in a portable way.
64+
65+
This extension provides a new query, `info::device::num_compute_units`, with
66+
the aim to clarify the meaning of "compute units" in SYCL and drive consistency
67+
across implementations.
68+
69+
70+
== Specification
71+
72+
=== Feature test macro
73+
74+
This extension provides a feature-test macro as described in the core SYCL
75+
specification. An implementation supporting this extension must predefine the
76+
macro `SYCL_EXT_ONEAPI_NUM_COMPUTE_UNITS` to one of the values defined in
77+
the table below. Applications can test for the existence of this macro to
78+
determine if the implementation supports this feature, or applications can test
79+
the macro's value to determine which of the extension's features the
80+
implementation supports.
81+
82+
[%header,cols="1,5"]
83+
|===
84+
|Value
85+
|Description
86+
87+
|1
88+
|Initial version of this extension.
89+
|===
90+
91+
92+
=== Compute units
93+
94+
A SYCL device is divided into one or more compute units, which are each divided
95+
into one or more processing elements.
96+
97+
All work-items in a given work-group must execute on exactly one compute unit.
98+
The mapping of work-groups to compute units is not guaranteed: work-groups may
99+
be dispatched to compute units in any order, and this order may be different
100+
for every kernel launch.
101+
102+
An implementation may execute multiple work-groups on a single compute unit
103+
simultaneously, subject to the resource constraints described by other device
104+
and kernel queries.
105+
106+
The representation of specific hardware architectures in terms of compute units
107+
is tied to the execution model exposed by an implementation and is thus
108+
implementation-defined.
109+
110+
[_Note_: To improve the portability of SYCL programs, implementations are
111+
encouraged to define compute units such that it is possible to fully utilize
112+
the hardware resources of a device by launching one work-group of
113+
size `max_work_group_size` on each compute unit.{endnote}]
114+
115+
116+
=== Device queries
117+
118+
[source, c++]
119+
----
120+
namespace sycl::ext::oneapi::info::device {
121+
122+
struct num_compute_units;
123+
124+
}
125+
----
126+
127+
[%header,cols="1,5,5"]
128+
|===
129+
|Device Descriptor
130+
|Return Type
131+
|Description
132+
133+
|`ext::oneapi::info::device::num_compute_units`
134+
|`size_t`
135+
|Return the number of compute units in the device.
136+
The minimum value is 1.
137+
138+
[_Note_: The value is not required to be equal to the value returned by
139+
`max_compute_units`.{endnote}]
140+
|===
141+
142+
143+
== Implementation in {dpcpp}
144+
145+
This section is non-normative and applies only to the {dpcpp} implementation.
146+
147+
The table below explains how {dpcpp} calculates the number of compute units for
148+
different combinations of device and backend.
149+
150+
[%header,cols="1,5,10"]
151+
|===
152+
|Device Type
153+
|Backend(s)
154+
|Number of Domains
155+
156+
|CPU
157+
|OpenCL
158+
|Number of logical cores.
159+
160+
|Intel GPU
161+
|Any
162+
|Number of Xe cores.
163+
164+
|NVIDIA GPU
165+
|Any
166+
|Number of streaming multiprocessors (SMs).
167+
168+
|===
169+
170+
171+
== Issues
172+
173+
None.

0 commit comments

Comments
 (0)