-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][Doc] Add proposed range_type extension #15962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Pennycook
wants to merge
4
commits into
intel:sycl
Choose a base branch
from
Pennycook:sycl_ext_oneapi_range_type
base: sycl
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
186 changes: 186 additions & 0 deletions
186
sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
= sycl_ext_oneapi_range_type | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
:endnote: —{nbsp}end{nbsp}note | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2024 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 9 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
This extension also depends on the following other SYCL extensions: | ||
|
||
* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[ | ||
sycl_ext_oneapi_kernel_properties] | ||
|
||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. Interfaces defined in this specification may not be implemented yet | ||
or may be in a preliminary state. The specification itself may also change in | ||
incompatible ways before it is finalized. *Shipping software products should | ||
not rely on APIs defined in this specification.* | ||
|
||
|
||
== Overview | ||
|
||
The maximum number of work-items that can be launched in a single kernel | ||
depends on multiple factors. | ||
SYCL 2020 says that the total number of work-items must be representable as a | ||
`size_t`, but several implementations (including {dpcpp}) provide optimization | ||
options to assert that kernels will not require the full range of a `size_t`. | ||
|
||
This extension proposes a new kernel property that allows developers to declare | ||
the range requirements of individual kernels, providing more fine-grained | ||
control than existing compiler options and improved error behavior. | ||
|
||
The property described in this extension is an advanced feature that most | ||
applications should not need to use. | ||
In most cases, applications get the best performance without using this | ||
property. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_RANGE_TYPE` to one of the values defined in the table | ||
below. Applications can test for the existence of this macro to determine if | ||
the implementation supports this feature, or applications can test the macro's | ||
value to determine which of the extension's features the implementation | ||
supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
=== New property | ||
|
||
```c++ | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
struct range_type_key { | ||
template <typename T> | ||
using value_t = property_value<range_type_key, T>; | ||
}; | ||
|
||
template <typename T> | ||
inline constexpr range_type_key::value_t<T> range_type; | ||
|
||
} // namespace sycl::ext::oneapi::experimental | ||
``` | ||
|
||
|=== | ||
|Property|Description | ||
|
||
|`range_type` | ||
|The `range_type` property is an assertion by the application that the kernel | ||
will never be launched with more than `std::numeric_limits<T>::max()` | ||
work-items. | ||
If the kernel is launched with more than this many work-items, the | ||
implementation must throw a synchronous `exception` with the `errc::nd_range` | ||
error code. | ||
|
||
`T` must be an integral type. | ||
|
||
|=== | ||
|
||
This property can also be associated with a device function using the | ||
`SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro. | ||
|
||
There are special requirements whenever a device function defined in one | ||
translation unit makes a call to a device function that is defined in a second | ||
translation unit. | ||
In such a case, the second device function is always declared using | ||
`SYCL_EXTERNAL`. | ||
If the kernel calling these device functions is defined using a `range_type` | ||
property, the functions declared using `SYCL_EXTERNAL` must be similarly | ||
decorated to ensure that a compatible `range_type` is used. | ||
This decoration must exist in both the translation unit making the call and | ||
also in the translation unit that defines the function. | ||
If the `range_type` property is missing in the translation unit that makes the | ||
call, or if the `range_type` of the called function is not compatible with the | ||
`range_type` of the calling function, the program is ill-formed and the | ||
compiler must raise a diagnostic. | ||
Two `range_type` properties are considered compatible if all values that can be | ||
represented by the `range_type` of the caller function can be represented by | ||
the `range_type` of the called function. | ||
|
||
== Usage example | ||
|
||
```c++ | ||
namespace syclex = sycl::ext::oneapi::experimental; | ||
|
||
struct SmallKernel | ||
{ | ||
// Declare that this kernel supports at most 2^31-1 work-items. | ||
auto get(syclex::properties_tag) const { | ||
return syclex::properties{syclex::range_type<int>}; | ||
} | ||
}; | ||
|
||
... | ||
|
||
// Throws an exception with errc::nd_range error code. | ||
// (because 2147483648 > 2147483647) | ||
q.parallel_for(2147483648, SmallKernel()); | ||
``` | ||
|
||
== Interaction with the {dpcpp} "-fsycl-id-queries-fit-in-int" option | ||
|
||
The `-fsycl-id-queries-fit-in-int` option is specific to the {dpcpp} | ||
implementation. | ||
Therefore, this section that describes the interaction between this extension | ||
and that option is non-normative and does not apply to other SYCL | ||
implementations that may support this extension. | ||
|
||
If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int` | ||
option, all kernels and `SYCL_EXTERNAL` functions without an explicitly | ||
specified `range_type` property are compiled as-if `range_type<int>` was | ||
specified as a property of that kernel or function. | ||
|
||
== Issues | ||
|
||
None. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you decorate a non-kernel function with
range_type
? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.
Borrowing again from the default sub-group size stuff, we should probably add wording like this:
The last sentence is new, and the intent is to allow
range_type<int>
kernels to callrange_type<size_t>
functions, but not vice versa.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same
range_type
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?
Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with
range_type<size_t>
; if doing so prevents user kernels from calling it, that's a big usability issue.What happens today if we have a kernel in a translation unit compiled with
-fsycl-id-queries-fit-in-int
and it calls a function in a translation unit compiled with-fno-sycl-id-queries-fit-in-int
? If that works, that would be another reason to try and mimic that behavior here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-fsycl-id-queries-fit-in-int
does not produce any optional kernel features. All it does is defining the__SYCL_ID_QUERIES_FIT_IN_INT__
macro which is then used in headers and you can trace all its uses.The few key uses I'm aware of:
detail/id_queries_fit_in_int
header that is used fromhandler.hpp
and it defines helper functions to emit exceptions if user-defined range is too hugedefines.hpp
uses the macro to do#define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX)
. The latter is in turn used by files likend_item.hpp
,id.hpp
and others to put that assumption into every ID query (get_global_id
,operator[]
, conversion operators, etc.)Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).
Does it mean that we should emit an error if forward-declaration of
foo
ina.cpp
differs (range_type
property-wise) from its actual definition inb.cpp
? If so, I'm not entirely sure if we can catch this.Simple mismatches like
foo
calls incompatiblebar
should be detectable, because it sounds very similar to what we already do for named sub-group sizes