Skip to content

[SYCL][Graph] Support for native-command #16871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 17, 2025

Conversation

EwanC
Copy link
Contributor

@EwanC EwanC commented Feb 3, 2025

Support sycl_ext_codeplay_enqueue_native_command with SYCL-Graph for all of L0, CUDA, HIP, and OpenCL backends.

Introduces interop_handle::ext_oneapi_get_native_graph<backend>() to give the user access to the native graph object which native commands can be appended to. Implemented using new UR command-buffer entry-points urCommandBufferAppendNativeCommandExp and urCommandBufferGetNativeHandleExp.

To use CUDA as an example, code using ext_codeplay_enqueue_native_command eagerly can be updated from:

 CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) {
       auto NativeStream = IH.get_native_queue<cuda>();
       myNativeLibraryCall(NativeStream);
}

To

CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) {
     if (IH.ext_oneapi_has_graph())  {
       auto NativeGraph = IH.ext_oneapi_get_native_graph<cuda>();
       auto NativeStream = IH.get_native_queue<cuda>();

       // Start capture stream calls into graph
       cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr,
                                             nullptr, 0,
                                             CU_STREAM_CAPTURE_MODE_GLOBAL);

       myNativeLibraryCall(NativeStream);

       // Stop capturing stream calls into graph
       cuStreamEndCapture(NativeStream, &NativeGraph);
     } else {
       auto NativeStream = IH.get_native_queue<cuda>();
       myNativeLibraryCall(NativeStream );
    }
}

Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954

@EwanC EwanC force-pushed the graph_native_enqueue branch from 4fa2a2f to 2ead142 Compare February 3, 2025 16:07
@EwanC EwanC force-pushed the graph_native_enqueue branch from 2ead142 to 80269a8 Compare February 4, 2025 16:49
@EwanC EwanC force-pushed the graph_native_enqueue branch from 80269a8 to d40e0d1 Compare February 4, 2025 17:30
@EwanC EwanC force-pushed the graph_native_enqueue branch from d40e0d1 to 4aeba98 Compare February 4, 2025 21:17
@EwanC EwanC force-pushed the graph_native_enqueue branch 2 times, most recently from e4498ce to 7d12878 Compare February 5, 2025 09:46
@EwanC EwanC force-pushed the graph_native_enqueue branch from 7d12878 to 764aebc Compare February 5, 2025 13:21
@EwanC EwanC force-pushed the graph_native_enqueue branch from 764aebc to 4e62db9 Compare February 5, 2025 14:13
@EwanC EwanC force-pushed the graph_native_enqueue branch 3 times, most recently from 047eb0e to c07c67a Compare February 12, 2025 09:45
@EwanC EwanC changed the title DRAFT - [SYCL][Graph] Support for native-command [SYCL][Graph] Support for native-command Feb 12, 2025
@EwanC EwanC force-pushed the graph_native_enqueue branch 2 times, most recently from f499268 to 19bb3dc Compare February 19, 2025 09:53
Copy link
Contributor

@AerialMantis AerialMantis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

EwanC added 6 commits March 14, 2025 20:06
Support [sycl_ext_codeplay_enqueue_native_command](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_enqueue_native_command.asciidoc) with SYCL-Graph.

Introduces `interop_handle::ext_codeplay_get_native_graph<backend>()` to
give the user access to the native graph object which native commands
can be appended to.

To use CUDA as an example, code using `ext_codeplay_enqueue_native_command`
eagerly can be updated from:

```cpp
 CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) {
       auto NativeStream = IH.get_native_queue<cuda>();
       myNativeLibraryCall(NativeStream);
}
```

To

```cpp
CGH.ext_codeplay_enqueue_native_command([=](interop_handle IH) {
     if (IH.ext_codeplay_has_graph())  {
       auto NativeGraph = IH.ext_codeplay_get_native_graph<cuda>();
       auto NativeStream = IH.get_native_queue<cuda>();

       // Start capture stream calls into graph
       cuStreamBeginCaptureToGraph(NativeStream, NativeGraph, nullptr,
                                             nullptr, 0,
                                             CU_STREAM_CAPTURE_MODE_GLOBAL);

       myNativeLibraryCall(NativeStream);

       // Stop capturing stream calls into graph
       cuStreamEndCapture(NativeStream, &NativeGraph);
     } else {
       auto NativeStream = IH.get_native_queue<cuda>();
       myNativeLibraryCall(NativeStream );
    }
}
```

Example of how this integration could work in GROMACS https://gitlab.com/gromacs/gromacs/-/merge_requests/4954
@EwanC EwanC temporarily deployed to WindowsCILock March 14, 2025 20:18 — with GitHub Actions Inactive
@EwanC EwanC temporarily deployed to WindowsCILock March 14, 2025 20:34 — with GitHub Actions Inactive
@EwanC EwanC temporarily deployed to WindowsCILock March 14, 2025 20:34 — with GitHub Actions Inactive
@EwanC
Copy link
Contributor Author

EwanC commented Mar 17, 2025

@intel/llvm-gatekeepers This is ready to merge, thanks

@sommerlukas sommerlukas merged commit f1992a0 into intel:sycl Mar 17, 2025
30 of 32 checks passed
sommerlukas pushed a commit that referenced this pull request Mar 21, 2025
The merged PR #16871 introduced new UR
entry-points `urCommandBufferAppendNativeCommandExp` and
`urCommandBufferGetNativeHandleExp`, however there was no UR CTS
coverage for `urCommandBufferAppendNativeCommandExp` included, as
verification was done via SYCL E2E testing.

This PR adds UR CTS testing for `urCommandBufferAppendNativeCommandExp`
for all of the L0, CUDA, HIP, and OpenCL adapters.

Resolves #17448
kbenzie pushed a commit to oneapi-src/unified-runtime that referenced this pull request Mar 21, 2025
The merged PR intel/llvm#16871 introduced new UR
entry-points `urCommandBufferAppendNativeCommandExp` and
`urCommandBufferGetNativeHandleExp`, however there was no UR CTS
coverage for `urCommandBufferAppendNativeCommandExp` included, as
verification was done via SYCL E2E testing.

This PR adds UR CTS testing for `urCommandBufferAppendNativeCommandExp`
for all of the L0, CUDA, HIP, and OpenCL adapters.

Resolves intel/llvm#17448
EwanC added a commit to reble/llvm that referenced this pull request May 5, 2025
In intel#16871 the
`sycl_ext_codeplay_enqueue_native_command` extensions was extended to
add new `interop_handler` APIs for working with SYCL-Graph.

However, the extension macro was not bumped. This is problematic for
users that want to use the extension with graph support, but also
use older oneAPI releases.

```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else {
      // Eager path
    }
CGH.host_task(...)
```

By bumping the feature test macro, users can write code that supports
old DPC++ versions.
```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {

    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else
    {
      // Eager path
    }
CGH.host_task(...)
```
EwanC added a commit to reble/llvm that referenced this pull request May 13, 2025
In intel#16871 the
`sycl_ext_codeplay_enqueue_native_command` extensions was extended to
add new `interop_handler` APIs for working with SYCL-Graph.

However, the extension macro was not bumped. This is problematic for
users that want to use the extension with graph support, but also
use older oneAPI releases.

```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else {
      // Eager path
    }
CGH.host_task(...)
```

By bumping the feature test macro, users can write code that supports
old DPC++ versions.
```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {

    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else
    {
      // Eager path
    }
CGH.host_task(...)
```
EwanC added a commit to reble/llvm that referenced this pull request May 13, 2025
In intel#16871 the
`sycl_ext_codeplay_enqueue_native_command` extensions was extended to
add new `interop_handler` APIs for working with SYCL-Graph.

However, the extension macro was not bumped. This is problematic for
users that want to use the extension with graph support, but also
use older oneAPI releases.

```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else {
      // Eager path
    }
CGH.host_task(...)
```

By bumping the feature test macro, users can write code that supports
old DPC++ versions.
```cpp
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {

    if (IH.ext_codeplay_has_graph() /* is this defined?*/) {
      // Graph path
    } else
    {
      // Eager path
    }
CGH.host_task(...)
```
martygrant pushed a commit that referenced this pull request May 14, 2025
In #16871 the
`sycl_ext_codeplay_enqueue_native_command` extensions was extended to
add new `interop_handler` APIs for working with SYCL-Graph.

However the extension macro was not bumped, which I think was an
oversight. This is problematic for users that want to use the extension
with graph support, but also use older oneAPI releases.

```cpp
#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND 
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
    if (IH.ext_codeplay_has_graph()) {  // Is this API defined?
      // Graph path
    } else {
      // Eager path
    }
#else
CGH.host_task(...)
#endif
```

By bumping the feature test macro users can write code that supports old
DPC++ versions.
```cpp
#ifdef SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND 
CGH.ext_codeplay_enqueue_native_command([=](sycl::interop_handle IH) {
#if SYCL_EXT_ONEAPI_ENQUEUE_NATIVE_COMMAND > 1
    if (IH.ext_codeplay_has_graph()) {
      // Graph path
    } else
#endif
    {
      // Eager path
    }
#else
CGH.host_task(...)
#endif
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.