[SYCL][Doc] Add proposed range_type extension #15962

Pennycook · 2024-11-01T14:57:31Z

This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior.

This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior. Signed-off-by: John Pennycook <[email protected]>

sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc

Signed-off-by: John Pennycook <[email protected]>

gmlueck · 2024-11-05T13:46:34Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc

+If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int`
+option, all kernels and `SYCL_EXTERNAL` functions without an explicitly
+specified `range_type` property are compiled as-if `range_type<int>` was
+specified as a property of that kernel or function.


Can you decorate a non-kernel function with range_type? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.

I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.

Borrowing again from the default sub-group size stuff, we should probably add wording like this:

This property can also be associated with a device function using the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

There are special requirements whenever a device function defined in one translation unit makes a call to a device function that is defined in a second translation unit. In such a case, the second device function is always declared using SYCL_EXTERNAL. If the kernel calling these device functions is defined using a range_type property, the functions declared using SYCL_EXTERNAL must be similarly decorated to ensure that a compatible range_type is used. This decoration must exist in both the translation unit making the call and also in the translation unit that defines the function. If the range_type property is missing in the translation unit that makes the call, or if the range_type of the called function is not compatible with the range_type of the calling function, the program is ill-formed and the compiler must raise a diagnostic. Two range_type properties are considered compatible if all values that can be represented by the range_type of the caller function can be represented by the range_type of the called function.

The last sentence is new, and the intent is to allow range_type<int> kernels to call range_type<size_t> functions, but not vice versa.

I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same range_type?

I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?

Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with range_type<size_t>; if doing so prevents user kernels from calling it, that's a big usability issue.

What happens today if we have a kernel in a translation unit compiled with -fsycl-id-queries-fit-in-int and it calls a function in a translation unit compiled with -fno-sycl-id-queries-fit-in-int? If that works, that would be another reason to try and mimic that behavior here.

-fsycl-id-queries-fit-in-int does not produce any optional kernel features. All it does is defining the __SYCL_ID_QUERIES_FIT_IN_INT__ macro which is then used in headers and you can trace all its uses.

The few key uses I'm aware of:

we have detail/id_queries_fit_in_int header that is used from handler.hpp and it defines helper functions to emit exceptions if user-defined range is too huge

defines.hpp uses the macro to do #define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX). The latter is in turn used by files like nd_item.hpp, id.hpp and others to put that assumption into every ID query (get_global_id, operator[], conversion operators, etc.)

Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).

@AlexeySachkov, do you think the behavior I've sketched above is implementable?

If the range_type property is missing in the translation unit that makes the call

Does it mean that we should emit an error if forward-declaration of foo in a.cpp differs (range_type property-wise) from its actual definition in b.cpp? If so, I'm not entirely sure if we can catch this.

Simple mismatches like foo calls incompatible bar should be detectable, because it sounds very similar to what we already do for named sub-group sizes

github-actions · 2025-05-06T02:08:33Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

Pennycook · 2025-05-06T08:33:17Z

@gmlueck, @AlexeySachkov - Are either of you aware of anything that's preventing this from being merged?

After working on #18296, I'm even more convinced this is a good idea. Currently the SYCL runtime is responsible for checking whether the range provided to a kernel is compatible with the -fsycl-id-queries-fit-in-int optimization, so we do extra checks on every kernel launch. I tried my best to optimize them, but it would be better to move these checks somewhere lower in the stack.

If we had a property for decorating kernels that only support 32-bit ranges, we could push that down into the SPIR-V. UR/L0 could then check for that property as part of the existing "Is this a valid ND-range" checks, and the SYCL runtime could be simplified.

gmlueck · 2025-05-06T11:51:02Z

Are either of you aware of anything that's preventing this from being merged?

It seems like the conversation above is unresolved. That conversation implies that this new range_type property can be used to decorate non-kernel device functions, but the spec doesn't say that.

If we had a property for decorating kernels that only support 32-bit ranges, we could push that down into the SPIR-V. UR/L0 could then check for that property as part of the existing "Is this a valid ND-range" checks, and the SYCL runtime could be simplified.

To do this, you would need to create a SPIR-V extension that somehow encodes the range limit. I think the check would have to be done in Level Zero (not the UR) because the UR doesn't read the contents of the SPIR-V module.

Alternatively, You might be able to implement this in the SYCL runtime with no extra overhead for non-decorated kernels by doing something like:

if constexpr (kernel has range_type_key property) {
  if (std::numeric_limits<range_key::value_t>::max() < range) {
    throw
  }
}

The outer if can be constexpr because we know the kernel's properties at compile time. However, I think this cannot work when a kernel is launched via a sycl::kernel handle because we don't know the properties at compile time in this case.

Pennycook · 2025-05-06T12:18:10Z

It seems like the conversation above is unresolved. That conversation implies that this new range_type property can be used to decorate non-kernel device functions, but the spec doesn't say that.

Ah, yes. Ok, I'll add that.

To do this, you would need to create a SPIR-V extension that somehow encodes the range limit. I think the check would have to be done in Level Zero (not the UR) because the UR doesn't read the contents of the SPIR-V module.

Since it would just be an upper bound, wouldn't we just need to add a kernel decoration that stores a number? We would set it to 2^31-1 when the property is present, and wouldn't otherwise. The codegen would only change in a minor way: instead of having an assume that says the IDs are less than 2^31-1, we'd say they are less than the value reported by the decoration.

Alternatively, You might be able to implement this in the SYCL runtime with no extra overhead for non-decorated kernels by doing something like: ...

This would end up being quite similar to what we do now: we run the checks under an #ifdef, so in the 64-bit case we don't pay anything. The issue is that the 32-bit case is the compiler's default -- and that would continue to be the case after adding this property -- and so moving this to a place where we're already checking the launch bounds would be beneficial.

Signed-off-by: John Pennycook <[email protected]>

gmlueck · 2025-05-06T13:35:15Z

The codegen would only change in a minor way: instead of having an assume that says the IDs are less than 2^31-1, we'd say they are less than the value reported by the decoration.

This got me thinking ... do you have a plan for how this extension would be implemented? Currently, the headers use __SYCL_ASSUME_INT in key places which conditionally expands to a compiler directive depending on whether -fsycl-id-queries-fit-in-int is passed. After that, normal compiler optimizations generate better code because the compiler knows that certain values will always be less than 2^32.

You won't be able to do this if a kernel definition has the proposed range_type property. For example, the function nd_item::get_global_id currently has __SYCL_ASSUME_INT in its body. There is no way to know in the body of nd_item::get_global_id whether it is being called from a kernel with the range_type property, so we can't conditionally call __builtin_assume.

It seems like the implementation of this extension would require a special LLVMIR pass that recognizes functions that compute an ID (like nd_item::get_global_id), see whether the static call tree comes from a kernel decorated with range_type, and then conditionally inserts the equivalent of __builtin_assume.

Alternatively, maybe the DPC++ compiler doesn't generate any special code, and the SPIR-V is decorated in some special way whenever there is a function call to get an ID. It would then be IGC's responsibility to notice that the kernel is decorated with range_type and then treat all the ID's as 32-bit.

Either way, it does not seem like a trivial implementation.

Pennycook · 2025-05-06T13:57:33Z

This got me thinking ... do you have a plan for how this extension would be implemented? Currently, the headers use __SYCL_ASSUME_INT in key places which conditionally expands to a compiler directive depending on whether -fsycl-id-queries-fit-in-int is passed. After that, normal compiler optimizations generate better code because the compiler knows that certain values will always be less than 2^32.

Maybe I'm oversimplifying it, but I don't think this has to be very complicated.

__SYCL_ASSUME_INT is just a macro defined like this:

#define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX)

So my proposal would be to implement it something like this, instead (using MaxRange as a placeholder for whatever this built-in would end up being called at the SPIR-V level):

#define __SYCL_ASSUME_FIT(x) __builtin_assume((x) <= __spirv_MaxRange())

Then the compilers would do the following, if the range_type is present:

DPC++ would convert a range_type<T> decoration into something like a [[MaxRange(std::numeric_limits<T>::max())]]
DPC++ would convert the attribute into MaxRange MaxValue in the SPIR-V (e.g., MaxRange 2147483647)
IGC would see __builtin_assume((x) <= MaxRange) and substitute the appropriate value of MaxRange at each callsite

In the case that range_type isn't present, we'd just do the above but with range_type<size_t>. We'd generate an assume, but it wouldn't do anything, because we'd just be asserting that everything fits in a size_t.

gmlueck · 2025-05-06T14:11:29Z

That sounds roughly like my "alternatively" proposal. Some brief thoughts:

I think you cannot change the macro to __builtin_assume((x) <= __spirv_MaxRange()) because the DPC++ compiler passes expect the argument to __builtin_assume to be a constant, and __spirv_MaxRange is not a constant. You would have to invent some new SPIRV intrinsic instead.
This strategy shifts the optimization from DPC++ compiler passes to IGC compiler passes. It will be interesting to see if the optimization is as effective there.
It would be better if we could arrange for the SPIR-V module to not use the __spirv_MaxRange extension in the case when the kernel is not compiled with range_type. If we always generate SPIR-V modules that require __spirv_MaxRange, then this means that users must have an IGC that understands that extension even if they don't use the range_type extension.

Pennycook added the spec extension All issues/PRs related to extensions specifications label Nov 1, 2024

Pennycook requested a review from a team as a code owner November 1, 2024 14:57

gmlueck reviewed Nov 4, 2024

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_range_type.asciidoc Outdated Show resolved Hide resolved

Simplify error behavior and wording

c53b75b

Signed-off-by: John Pennycook <[email protected]>

gmlueck approved these changes Nov 5, 2024

View reviewed changes

Add section about -fsycl-id-queries-fit-in-int

4d248cb

Signed-off-by: John Pennycook <[email protected]>

gmlueck reviewed Nov 5, 2024

View reviewed changes

github-actions bot added the Stale label May 6, 2025

Clarify that range_type attaches to functions

81fda8c

Signed-off-by: John Pennycook <[email protected]>

github-actions bot removed the Stale label May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][Doc] Add proposed range_type extension #15962

[SYCL][Doc] Add proposed range_type extension #15962

Uh oh!

Pennycook commented Nov 1, 2024

Uh oh!

Uh oh!

Uh oh!

gmlueck Nov 5, 2024

Uh oh!

Pennycook Nov 5, 2024

Uh oh!

gmlueck Nov 5, 2024

Uh oh!

Pennycook Nov 6, 2024

Uh oh!

AlexeySachkov Nov 6, 2024

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Uh oh!

[SYCL][Doc] Add proposed range_type extension #15962

Are you sure you want to change the base?

[SYCL][Doc] Add proposed range_type extension #15962

Uh oh!

Conversation

Pennycook commented Nov 1, 2024

Uh oh!

Uh oh!

Uh oh!

gmlueck Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

Pennycook Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

gmlueck Nov 5, 2024

Choose a reason for hiding this comment

Uh oh!

Pennycook Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

AlexeySachkov Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Pennycook commented May 6, 2025

Uh oh!

gmlueck commented May 6, 2025

Uh oh!

Uh oh!