sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

kbenzie · 2024-04-15T14:31:41Z

Cherry-pick for sycl-rel_5_2_0 depends on #13401

Implement seq_cst RC11/ptx6.0 memory consistency for CUDA backend.

See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model
for full details. Requires sm_70 or above. With this PR there is now a
complete mapping between SYCL memory consistency model capabilities and
the official CUDA model, fully exploiting CUDA capabilities when
possible on supported arches.

This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the
cuda backend.

Fixes #11208

Depends on #12907

Signed-off-by: JackAKirk [email protected]

intel#12983) … info pre-commit PR for oneapi-src/unified-runtime#1429 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

Implement `seq_cst` RC11/ptx6.0 memory consistency for CUDA backend. See https://dl.acm.org/doi/pdf/10.1145/3297858.3304043 and https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model for full details. Requires sm_70 or above. With this PR there is now a complete mapping between SYCL memory consistency model capabilities and the official CUDA model, fully exploiting CUDA capabilities when possible on supported arches. This makes the SYCL-CTS atomic_ref tests fully pass for sm_70 on the cuda backend. Fixes intel#11208 Depends on intel#12907 --------- Signed-off-by: JackAKirk <[email protected]>

nrspruit and others added 2 commits April 15, 2024 05:37

[UR][L0] Support for urUsmP2PPeerAccessGetInfoExp to query p2p access… (

22e9785

intel#12983) … info pre-commit PR for oneapi-src/unified-runtime#1429 --------- Signed-off-by: Neil R. Spruit <[email protected]> Co-authored-by: Kenneth Benzie (Benie) <[email protected]>

kbenzie changed the title ~~cherry pick 12516~~ sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) Apr 15, 2024

kbenzie mentioned this pull request Apr 15, 2024

sycl-rel_5_2_0: [UR] Add urProgramGetGlobalVariablePointer entrypoint (#12496) #13404

Closed

kbenzie closed this May 10, 2024

kbenzie deleted the cherry-pick-12516 branch December 18, 2024 13:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

Uh oh!

kbenzie commented Apr 15, 2024 •

edited

Loading

Uh oh!

Uh oh!

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

sycl-rel_5_2_0: [CUDA][LIBCLC] Implement RC11 seq_cst for PTX6.0 (#12516) #13403

Uh oh!

Conversation

kbenzie commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kbenzie commented Apr 15, 2024 •

edited

Loading