[UR][CUDA] Add tensor map APIs #1811

hdelan · 2024-07-02T17:04:11Z

Intended to target the APIs here

include/ur_api.h

source/adapters/cuda/tensor_map.cpp

source/loader/layers/validation/ur_valddi.cpp

hdelan · 2024-07-04T15:22:44Z

@frasercrmck have responded to all of your comments I think.

scripts/core/exp-tensor-map.yml

scripts/core/registry.yml

hdelan · 2024-10-25T15:40:45Z

Looks like this is causing some build issues in the L0 static build

Should be fixed. Will keep an eye on CI.

npmiller · 2024-11-05T15:08:50Z

ping @oneapi-src/unified-runtime-native-cpu-write @oneapi-src/unified-runtime-opencl-write @oneapi-src/unified-runtime-hip-write @oneapi-src/unified-runtime-level-zero-write

This should be ready to review

kbenzie

There is a general lack of detail in the docs and descriptions in this PR. I honestly couldn't tell you what its for.

The fact this only adds support for CUDA may also be a sticking point.

kbenzie · 2024-11-08T15:06:12Z

scripts/core/exp-tensor-map.yml

+    - type: $x_exp_tensor_map_data_type_flags_t
+      name: TensorMapType
+      desc: "[in] Data type of the tensor object."
+    - type: uint32_t
+      name: TensorRank
+      desc: "[in] Dimensionality of tensor; must be at least 3."
+    - type: void*
+      name: GlobalAddress
+      desc: "[in] Starting address of memory region described by tensor."
+    - type: const uint64_t*
+      name: GlobalDim
+      desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions."
+    - type: const uint64_t*
+      name: GlobalStrides
+      desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions."
+    - type: const int*
+      name: PixelBoxLowerCorner
+      desc: "[in] Array containing DHW dimensions of lower box corner."
+    - type: const int*
+      name: PixelBoxUpperCorner
+      desc: "[in] Array containing DHW dimensions of upper box corner."
+    - type: uint32_t
+      name: ChannelsPerPixel
+      desc: "[in] Number of channels per pixel."
+    - type: uint32_t
+      name: PixelsPerColumn
+      desc: "[in] Number of pixels per column."
+    - type: const uint32_t*
+      name: ElementStrides
+      desc: "[in] Array containing traversal stride in each of the TensorRank dimensions."
+    - type: $x_exp_tensor_map_interleave_flags_t
+      name: Interleave
+      desc: "[in] Type of interleaved layout the tensor addresses"
+    - type: $x_exp_tensor_map_swizzle_flags_t
+      name: Swizzle
+      desc: "[in] Bank swizzling pattern inside shared memory"
+    - type: $x_exp_tensor_map_l2_promotion_flags_t
+      name: L2Promotion
+      desc: "[in] L2 promotion size."
+    - type: $x_exp_tensor_map_oob_fill_flags_t
+      name: OobFill
+      desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements."


This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.

That would make sense, however this extension is currently only to match the similar CUDA API, so I think it makes more sense to keep it close to the original one.

Currently no other hardware has a need for this, it may change in the future but without knowing the specifics of how other hardware will handle this we can't define a generic interface, so we'd like to have this as an experimental CUDA specific interface, to be revisited if and when other hardware also needs it.

kbenzie · 2024-11-08T15:07:00Z

scripts/core/exp-tensor-map.yml

+      desc: "[in] Handle of the device object."
+    - type: $x_exp_tensor_map_data_type_flags_t
+      name: TensorMapType
+      desc: "[in] Data type of the tensor object."
+    - type: uint32_t
+      name: TensorRank
+      desc: "[in] Dimensionality of tensor; must be at least 3."
+    - type: void*
+      name: GlobalAddress
+      desc: "[in] Starting address of memory region described by tensor."
+    - type: const uint64_t*
+      name: GlobalDim
+      desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions."
+    - type: const uint64_t*
+      name: GlobalStrides
+      desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions."
+    - type: const uint32_t*
+      name: BoxDim
+      desc: "[in] Array containing traversal box size (number of elments) along each of the TensorRank dimensions. Specifies how many elements to be traversed along each tensor dimension."
+    - type: const uint32_t*
+      name: ElementStrides
+      desc: "[in] Array containing traversal stride in each of the TensorRank dimensions."
+    - type: $x_exp_tensor_map_interleave_flags_t
+      name: Interleave
+      desc: "[in] Type of interleaved layout the tensor addresses"
+    - type: $x_exp_tensor_map_swizzle_flags_t
+      name: Swizzle
+      desc: "[in] Bank swizzling pattern inside shared memory"
+    - type: $x_exp_tensor_map_l2_promotion_flags_t
+      name: L2Promotion
+      desc: "[in] L2 promotion size."
+    - type: $x_exp_tensor_map_oob_fill_flags_t
+      name: OobFill
+      desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements."


This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.

Also, if using properties structs, would it be possible to expose this functionality in a single entry point with pNext chain for the differences?

aarongreig · 2024-11-08T15:21:28Z

scripts/core/EXP-TENSOR-MAP.rst

+Support
+--------------------------------------------------------------------------------
+
+This is only supported in the CUDA adapter.


could add an explicit reference to UR_PLATFORM_BACKEND_CUDA here for a bit of extra formality

npmiller · 2024-11-11T13:53:12Z

The fact this only adds support for CUDA may also be a sticking point.

Currently as far as I'm aware no other target has this so this is purely CUDA specific, that's mostly why it isn't implemented for other targets, and also why it aligns closely to the matching CUDA API.

- Check that TensorDim < 3 using yaml returns: . - Rename some things and remove copypasta

Fixes missing symbol at linking for static build of L0 adapter.

npmiller · 2024-12-04T13:14:43Z

The only test failures seem to be a problem with the runner:

 urDevicesGet() failed to get number of devices.

…ified-runtime#1811.

martygrant · 2024-12-05T14:46:49Z

this was pulled into intel/llvm in intel/llvm#15911

…api" This reverts commit 1851eff, reversing changes made to 5f4a5a2.

hdelan requested a review from a team as a code owner July 2, 2024 17:04

hdelan marked this pull request as draft July 2, 2024 17:06

github-actions bot added loader Loader related feature/bug specification Changes or additions to the specification experimental Experimental feature additions/changes/specification labels Jul 2, 2024

hdelan force-pushed the tensormap-exp-api branch 3 times, most recently from cc67afb to 221e4db Compare July 3, 2024 09:17

github-actions bot added the cuda CUDA adapter specific issues label Jul 3, 2024

hdelan force-pushed the tensormap-exp-api branch 2 times, most recently from 8f63fb0 to ae22a1d Compare July 3, 2024 11:54

hdelan marked this pull request as ready for review July 3, 2024 11:55

hdelan requested a review from a team as a code owner July 3, 2024 11:55

hdelan requested a review from frasercrmck July 3, 2024 11:55

hdelan changed the title ~~[draft] Add tensor map APIs~~ [UR][SYCL] Add tensor map APIs Jul 3, 2024

hdelan changed the title ~~[UR][SYCL] Add tensor map APIs~~ [UR][CUDA] Add tensor map APIs Jul 3, 2024

hdelan force-pushed the tensormap-exp-api branch from ae22a1d to 72935f1 Compare July 4, 2024 09:28

frasercrmck reviewed Jul 4, 2024

View reviewed changes

scripts/core/exp-tensor-map.yml Outdated Show resolved Hide resolved

scripts/core/exp-tensor-map.yml Outdated Show resolved Hide resolved

scripts/core/exp-tensor-map.yml Outdated Show resolved Hide resolved

scripts/core/exp-tensor-map.yml Outdated Show resolved Hide resolved

hdelan force-pushed the tensormap-exp-api branch 2 times, most recently from 573ba28 to ba8a391 Compare July 4, 2024 16:01

frasercrmck reviewed Jul 4, 2024

View reviewed changes

scripts/core/registry.yml Outdated Show resolved Hide resolved

hdelan force-pushed the tensormap-exp-api branch from ba8a391 to 309e02f Compare July 15, 2024 10:57

hdelan requested a review from frasercrmck July 15, 2024 11:53

hdelan requested review from a team as code owners July 15, 2024 14:11

github-actions bot added the level-zero L0 adapter specific issues label Jul 15, 2024

npmiller requested review from kbenzie and pbalcer November 5, 2024 15:06

kbenzie reviewed Nov 8, 2024

View reviewed changes

aarongreig reviewed Nov 8, 2024

View reviewed changes

alycm added the v0.11.x Include in the v0.11.x release label Nov 26, 2024

hdelan and others added 9 commits December 3, 2024 14:54

Add initial spec for tensor map APIs

1433f04

Add CUDA impl

69038e6

Respond to comments

05492e6

- Check that TensorDim < 3 using yaml returns: . - Rename some things and remove copypasta

Add unsupported entry points to other adapters

e063597

Clang format

6e6059c

Put UR entry points in ur::level_zero

ccde31e

Fixes missing symbol at linking for static build of L0 adapter.

Add ProcAddrTable Entry points

837aa27

Fix bad merge conflicts resolution

b1a3286

Add clarifications in extension documentation

2f5ff27

npmiller force-pushed the tensormap-exp-api branch from df93280 to 2f5ff27 Compare December 3, 2024 14:55

github-actions bot added the opencl OpenCL adapter specific issues label Dec 3, 2024

npmiller added 2 commits December 3, 2024 15:31

Fix formatting

8c4366f

Fix enum ordering for tensor map

32cc0d9

ldrumm mentioned this pull request Dec 3, 2024

[SYCL][ext] Define and Implement sycl_ext_tensor_map intel/llvm#16247

Closed

Add tensormap stubs to L0 v2

72b5730

kbenzie approved these changes Dec 5, 2024

View reviewed changes

martygrant merged commit 1851eff into oneapi-src:main Dec 5, 2024
70 of 73 checks passed

martygrant added a commit to zhaomaosu/llvm that referenced this pull request Dec 5, 2024

[UR] Update tag for oneapi-src/unified-runtime#2249 and oneapi-src/un…

df5fe9f

…ified-runtime#1811.

npmiller added a commit to npmiller/unified-runtime that referenced this pull request Jan 23, 2025

Revert "Merge pull request oneapi-src#1811 from hdelan/tensormap-exp-…

8d4be29

…api" This reverts commit 1851eff, reversing changes made to 5f4a5a2.

npmiller mentioned this pull request Jan 23, 2025

Revert "[UR][CUDA] Add tensor map APIs" #2610

Merged

[UR][CUDA] Add tensor map APIs #1811

[UR][CUDA] Add tensor map APIs #1811

Uh oh!

Conversation

hdelan commented Jul 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdelan commented Jul 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hdelan commented Oct 25, 2024

Uh oh!

npmiller commented Nov 5, 2024

Uh oh!

kbenzie left a comment

Choose a reason for hiding this comment

Uh oh!

kbenzie Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

npmiller Dec 3, 2024

Choose a reason for hiding this comment

Uh oh!

kbenzie Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

aarongreig Nov 8, 2024

Choose a reason for hiding this comment

Uh oh!

npmiller commented Nov 11, 2024

Uh oh!

npmiller commented Dec 4, 2024

Uh oh!

Uh oh!

martygrant commented Dec 5, 2024

Uh oh!

Uh oh!