Skip to content

[SYCL][Doc] Add proposed range_type extension #15962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: sycl
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
= sycl_ext_oneapi_range_type

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]
:endnote: —{nbsp}end{nbsp}note

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2024 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 9 specification. All
references below to the "core SYCL specification" or to section numbers in the
SYCL specification refer to that revision.

This extension also depends on the following other SYCL extensions:

* link:../experimental/sycl_ext_oneapi_kernel_properties.asciidoc[
sycl_ext_oneapi_kernel_properties]


== Status

This is a proposed extension specification, intended to gather community
feedback. Interfaces defined in this specification may not be implemented yet
or may be in a preliminary state. The specification itself may also change in
incompatible ways before it is finalized. *Shipping software products should
not rely on APIs defined in this specification.*


== Overview

The maximum number of work-items that can be launched in a single kernel
depends on multiple factors.
SYCL 2020 says that the total number of work-items must be representable as a
`size_t`, but several implementations (including {dpcpp}) provide optimization
options to assert that kernels will not require the full range of a `size_t`.

This extension proposes a new kernel property that allows developers to declare
the range requirements of individual kernels, providing more fine-grained
control than existing compiler options and improved error behavior.

The property described in this extension is an advanced feature that most
applications should not need to use.
In most cases, applications get the best performance without using this
property.


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro `SYCL_EXT_ONEAPI_RANGE_TYPE` to one of the values defined in the table
below. Applications can test for the existence of this macro to determine if
the implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== New property

```c++
namespace sycl::ext::oneapi::experimental {

struct range_type_key {
template <typename T>
using value_t = property_value<range_type_key, T>;
};

template <typename T>
inline constexpr range_type_key::value_t<T> range_type;

} // namespace sycl::ext::oneapi::experimental
```

|===
|Property|Description

|`range_type`
|The `range_type` property is an assertion by the application that the kernel
will never be launched with more than `std::numeric_limits<T>::max()`
work-items.
If the kernel is launched with more than this many work-items, the
implementation must throw a synchronous `exception` with the `errc::nd_range`
error code.

`T` must be an integral type.

|===

This property can also be associated with a device function using the
`SYCL_EXT_ONEAPI_FUNCTION_PROPERTY` macro.

There are special requirements whenever a device function defined in one
translation unit makes a call to a device function that is defined in a second
translation unit.
In such a case, the second device function is always declared using
`SYCL_EXTERNAL`.
If the kernel calling these device functions is defined using a `range_type`
property, the functions declared using `SYCL_EXTERNAL` must be similarly
decorated to ensure that a compatible `range_type` is used.
This decoration must exist in both the translation unit making the call and
also in the translation unit that defines the function.
If the `range_type` property is missing in the translation unit that makes the
call, or if the `range_type` of the called function is not compatible with the
`range_type` of the calling function, the program is ill-formed and the
compiler must raise a diagnostic.
Two `range_type` properties are considered compatible if all values that can be
represented by the `range_type` of the caller function can be represented by
the `range_type` of the called function.

== Usage example

```c++
namespace syclex = sycl::ext::oneapi::experimental;

struct SmallKernel
{
// Declare that this kernel supports at most 2^31-1 work-items.
auto get(syclex::properties_tag) const {
return syclex::properties{syclex::range_type<int>};
}
};

...

// Throws an exception with errc::nd_range error code.
// (because 2147483648 > 2147483647)
q.parallel_for(2147483648, SmallKernel());
```

== Interaction with the {dpcpp} "-fsycl-id-queries-fit-in-int" option

The `-fsycl-id-queries-fit-in-int` option is specific to the {dpcpp}
implementation.
Therefore, this section that describes the interaction between this extension
and that option is non-normative and does not apply to other SYCL
implementations that may support this extension.

If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int`
option, all kernels and `SYCL_EXTERNAL` functions without an explicitly
specified `range_type` property are compiled as-if `range_type<int>` was
specified as a property of that kernel or function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you decorate a non-kernel function with range_type? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.

Borrowing again from the default sub-group size stuff, we should probably add wording like this:

This property can also be associated with a device function using the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

There are special requirements whenever a device function defined in one translation unit makes a call to a device function that is defined in a second translation unit. In such a case, the second device function is always declared using SYCL_EXTERNAL. If the kernel calling these device functions is defined using a range_type property, the functions declared using SYCL_EXTERNAL must be similarly decorated to ensure that a compatible range_type is used. This decoration must exist in both the translation unit making the call and also in the translation unit that defines the function. If the range_type property is missing in the translation unit that makes the call, or if the range_type of the called function is not compatible with the range_type of the calling function, the program is ill-formed and the compiler must raise a diagnostic. Two range_type properties are considered compatible if all values that can be represented by the range_type of the caller function can be represented by the range_type of the called function.

The last sentence is new, and the intent is to allow range_type<int> kernels to call range_type<size_t> functions, but not vice versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same range_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?

Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with range_type<size_t>; if doing so prevents user kernels from calling it, that's a big usability issue.

What happens today if we have a kernel in a translation unit compiled with -fsycl-id-queries-fit-in-int and it calls a function in a translation unit compiled with -fno-sycl-id-queries-fit-in-int? If that works, that would be another reason to try and mimic that behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-id-queries-fit-in-int does not produce any optional kernel features. All it does is defining the __SYCL_ID_QUERIES_FIT_IN_INT__ macro which is then used in headers and you can trace all its uses.

The few key uses I'm aware of:

  • we have detail/id_queries_fit_in_int header that is used from handler.hpp and it defines helper functions to emit exceptions if user-defined range is too huge
  • defines.hpp uses the macro to do #define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX). The latter is in turn used by files like nd_item.hpp, id.hpp and others to put that assumption into every ID query (get_global_id, operator[], conversion operators, etc.)

Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).

@AlexeySachkov, do you think the behavior I've sketched above is implementable?

If the range_type property is missing in the translation unit that makes the call

Does it mean that we should emit an error if forward-declaration of foo in a.cpp differs (range_type property-wise) from its actual definition in b.cpp? If so, I'm not entirely sure if we can catch this.

Simple mismatches like foo calls incompatible bar should be detectable, because it sounds very similar to what we already do for named sub-group sizes


== Issues

None.