Skip to content

[SYCL][Doc] Add proposed range_type extension #15962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: sycl
Choose a base branch
from

Conversation

Pennycook
Copy link
Contributor

This extension proposes a new kernel property that allows developers to declare the range requirements of individual kernels, providing more fine-grained control than existing compiler options and improved error behavior.

This extension proposes a new kernel property that allows developers to declare
the range requirements of individual kernels, providing more fine-grained
control than existing compiler options and improved error behavior.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook added the spec extension All issues/PRs related to extensions specifications label Nov 1, 2024
@Pennycook Pennycook requested a review from a team as a code owner November 1, 2024 14:57
If a translation unit is compiled with the `-fsycl-id-queries-fit-in-int`
option, all kernels and `SYCL_EXTERNAL` functions without an explicitly
specified `range_type` property are compiled as-if `range_type<int>` was
specified as a property of that kernel or function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you decorate a non-kernel function with range_type? If so, you should add that to the specification above. Until you write this, I assumed this option could only be used to decorate a kernel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to support in both places, because of things like function pointers and non-inlined functions. Otherwise, a compiler (like DPC++) might compile a function that assumes 32-bit ranges, and try to call it from a kernel that supports 64-bit ranges.

Borrowing again from the default sub-group size stuff, we should probably add wording like this:

This property can also be associated with a device function using the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

There are special requirements whenever a device function defined in one translation unit makes a call to a device function that is defined in a second translation unit. In such a case, the second device function is always declared using SYCL_EXTERNAL. If the kernel calling these device functions is defined using a range_type property, the functions declared using SYCL_EXTERNAL must be similarly decorated to ensure that a compatible range_type is used. This decoration must exist in both the translation unit making the call and also in the translation unit that defines the function. If the range_type property is missing in the translation unit that makes the call, or if the range_type of the called function is not compatible with the range_type of the calling function, the program is ill-formed and the compiler must raise a diagnostic. Two range_type properties are considered compatible if all values that can be represented by the range_type of the caller function can be represented by the range_type of the called function.

The last sentence is new, and the intent is to allow range_type<int> kernels to call range_type<size_t> functions, but not vice versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks good. I do wonder, though, if we need this generality. Would it be easier to require the caller and called functions to have the same range_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. It means the compatibility check becomes something more like a >= than a ==, which doesn't seem like a big implementation change to me. It might have wider implications for bundling optional features, but I don't know a lot about that. @AlexeySachkov, do you think the behavior I've sketched above is implementable?

Assuming that it's implementable, I think the generality is preferable. If a library wants to ship a device function that supports 64-bit indices, it'll mark that function with range_type<size_t>; if doing so prevents user kernels from calling it, that's a big usability issue.

What happens today if we have a kernel in a translation unit compiled with -fsycl-id-queries-fit-in-int and it calls a function in a translation unit compiled with -fno-sycl-id-queries-fit-in-int? If that works, that would be another reason to try and mimic that behavior here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-fsycl-id-queries-fit-in-int does not produce any optional kernel features. All it does is defining the __SYCL_ID_QUERIES_FIT_IN_INT__ macro which is then used in headers and you can trace all its uses.

The few key uses I'm aware of:

  • we have detail/id_queries_fit_in_int header that is used from handler.hpp and it defines helper functions to emit exceptions if user-defined range is too huge
  • defines.hpp uses the macro to do #define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX). The latter is in turn used by files like nd_item.hpp, id.hpp and others to put that assumption into every ID query (get_global_id, operator[], conversion operators, etc.)

Therefore, I think that it should be possible today to perform cross-translation unit calls where translation units are compiled with different value of the aforementioned flag. I'm not entirely sure of what the behavior would be of optimizations which rely on that assumption, because there are many factors which contribute to that (like how exactly and when exactly and which exactly other optimizations have been performed on those translation units and the final linked device code).

@AlexeySachkov, do you think the behavior I've sketched above is implementable?

If the range_type property is missing in the translation unit that makes the call

Does it mean that we should emit an error if forward-declaration of foo in a.cpp differs (range_type property-wise) from its actual definition in b.cpp? If so, I'm not entirely sure if we can catch this.

Simple mismatches like foo calls incompatible bar should be detectable, because it sounds very similar to what we already do for named sub-group sizes

Copy link
Contributor

github-actions bot commented May 6, 2025

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

@github-actions github-actions bot added the Stale label May 6, 2025
@Pennycook
Copy link
Contributor Author

@gmlueck, @AlexeySachkov - Are either of you aware of anything that's preventing this from being merged?

After working on #18296, I'm even more convinced this is a good idea. Currently the SYCL runtime is responsible for checking whether the range provided to a kernel is compatible with the -fsycl-id-queries-fit-in-int optimization, so we do extra checks on every kernel launch. I tried my best to optimize them, but it would be better to move these checks somewhere lower in the stack.

If we had a property for decorating kernels that only support 32-bit ranges, we could push that down into the SPIR-V. UR/L0 could then check for that property as part of the existing "Is this a valid ND-range" checks, and the SYCL runtime could be simplified.

@gmlueck
Copy link
Contributor

gmlueck commented May 6, 2025

Are either of you aware of anything that's preventing this from being merged?

It seems like the conversation above is unresolved. That conversation implies that this new range_type property can be used to decorate non-kernel device functions, but the spec doesn't say that.

If we had a property for decorating kernels that only support 32-bit ranges, we could push that down into the SPIR-V. UR/L0 could then check for that property as part of the existing "Is this a valid ND-range" checks, and the SYCL runtime could be simplified.

To do this, you would need to create a SPIR-V extension that somehow encodes the range limit. I think the check would have to be done in Level Zero (not the UR) because the UR doesn't read the contents of the SPIR-V module.

Alternatively, You might be able to implement this in the SYCL runtime with no extra overhead for non-decorated kernels by doing something like:

if constexpr (kernel has range_type_key property) {
  if (std::numeric_limits<range_key::value_t>::max() < range) {
    throw
  }
}

The outer if can be constexpr because we know the kernel's properties at compile time. However, I think this cannot work when a kernel is launched via a sycl::kernel handle because we don't know the properties at compile time in this case.

@Pennycook
Copy link
Contributor Author

It seems like the conversation above is unresolved. That conversation implies that this new range_type property can be used to decorate non-kernel device functions, but the spec doesn't say that.

Ah, yes. Ok, I'll add that.

To do this, you would need to create a SPIR-V extension that somehow encodes the range limit. I think the check would have to be done in Level Zero (not the UR) because the UR doesn't read the contents of the SPIR-V module.

Since it would just be an upper bound, wouldn't we just need to add a kernel decoration that stores a number? We would set it to 2^31-1 when the property is present, and wouldn't otherwise. The codegen would only change in a minor way: instead of having an assume that says the IDs are less than 2^31-1, we'd say they are less than the value reported by the decoration.

Alternatively, You might be able to implement this in the SYCL runtime with no extra overhead for non-decorated kernels by doing something like: ...

This would end up being quite similar to what we do now: we run the checks under an #ifdef, so in the 64-bit case we don't pay anything. The issue is that the 32-bit case is the compiler's default -- and that would continue to be the case after adding this property -- and so moving this to a place where we're already checking the launch bounds would be beneficial.

@gmlueck
Copy link
Contributor

gmlueck commented May 6, 2025

The codegen would only change in a minor way: instead of having an assume that says the IDs are less than 2^31-1, we'd say they are less than the value reported by the decoration.

This got me thinking ... do you have a plan for how this extension would be implemented? Currently, the headers use __SYCL_ASSUME_INT in key places which conditionally expands to a compiler directive depending on whether -fsycl-id-queries-fit-in-int is passed. After that, normal compiler optimizations generate better code because the compiler knows that certain values will always be less than 2^32.

You won't be able to do this if a kernel definition has the proposed range_type property. For example, the function nd_item::get_global_id currently has __SYCL_ASSUME_INT in its body. There is no way to know in the body of nd_item::get_global_id whether it is being called from a kernel with the range_type property, so we can't conditionally call __builtin_assume.

It seems like the implementation of this extension would require a special LLVMIR pass that recognizes functions that compute an ID (like nd_item::get_global_id), see whether the static call tree comes from a kernel decorated with range_type, and then conditionally inserts the equivalent of __builtin_assume.

Alternatively, maybe the DPC++ compiler doesn't generate any special code, and the SPIR-V is decorated in some special way whenever there is a function call to get an ID. It would then be IGC's responsibility to notice that the kernel is decorated with range_type and then treat all the ID's as 32-bit.

Either way, it does not seem like a trivial implementation.

@Pennycook
Copy link
Contributor Author

This got me thinking ... do you have a plan for how this extension would be implemented? Currently, the headers use __SYCL_ASSUME_INT in key places which conditionally expands to a compiler directive depending on whether -fsycl-id-queries-fit-in-int is passed. After that, normal compiler optimizations generate better code because the compiler knows that certain values will always be less than 2^32.

Maybe I'm oversimplifying it, but I don't think this has to be very complicated.

__SYCL_ASSUME_INT is just a macro defined like this:

#define __SYCL_ASSUME_INT(x) __builtin_assume((x) <= INT_MAX)

So my proposal would be to implement it something like this, instead (using MaxRange as a placeholder for whatever this built-in would end up being called at the SPIR-V level):

#define __SYCL_ASSUME_FIT(x) __builtin_assume((x) <= __spirv_MaxRange())

Then the compilers would do the following, if the range_type is present:

  • DPC++ would convert a range_type<T> decoration into something like a [[MaxRange(std::numeric_limits<T>::max())]]
  • DPC++ would convert the attribute into MaxRange MaxValue in the SPIR-V (e.g., MaxRange 2147483647)
  • IGC would see __builtin_assume((x) <= MaxRange) and substitute the appropriate value of MaxRange at each callsite

In the case that range_type isn't present, we'd just do the above but with range_type<size_t>. We'd generate an assume, but it wouldn't do anything, because we'd just be asserting that everything fits in a size_t.

@gmlueck
Copy link
Contributor

gmlueck commented May 6, 2025

That sounds roughly like my "alternatively" proposal. Some brief thoughts:

  • I think you cannot change the macro to __builtin_assume((x) <= __spirv_MaxRange()) because the DPC++ compiler passes expect the argument to __builtin_assume to be a constant, and __spirv_MaxRange is not a constant. You would have to invent some new SPIRV intrinsic instead.

  • This strategy shifts the optimization from DPC++ compiler passes to IGC compiler passes. It will be interesting to see if the optimization is as effective there.

  • It would be better if we could arrange for the SPIR-V module to not use the __spirv_MaxRange extension in the case when the kernel is not compiled with range_type. If we always generate SPIR-V modules that require __spirv_MaxRange, then this means that users must have an IGC that understands that extension even if they don't use the range_type extension.

@github-actions github-actions bot removed the Stale label May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants