-
Notifications
You must be signed in to change notification settings - Fork 125
[UR][CUDA] Add tensor map APIs #1811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc67afb
to
221e4db
Compare
8f63fb0
to
ae22a1d
Compare
@frasercrmck have responded to all of your comments I think. |
573ba28
to
ba8a391
Compare
ba8a391
to
309e02f
Compare
Should be fixed. Will keep an eye on CI. |
ping @oneapi-src/unified-runtime-native-cpu-write @oneapi-src/unified-runtime-opencl-write @oneapi-src/unified-runtime-hip-write @oneapi-src/unified-runtime-level-zero-write This should be ready to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a general lack of detail in the docs and descriptions in this PR. I honestly couldn't tell you what its for.
The fact this only adds support for CUDA may also be a sticking point.
- type: $x_exp_tensor_map_data_type_flags_t | ||
name: TensorMapType | ||
desc: "[in] Data type of the tensor object." | ||
- type: uint32_t | ||
name: TensorRank | ||
desc: "[in] Dimensionality of tensor; must be at least 3." | ||
- type: void* | ||
name: GlobalAddress | ||
desc: "[in] Starting address of memory region described by tensor." | ||
- type: const uint64_t* | ||
name: GlobalDim | ||
desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions." | ||
- type: const uint64_t* | ||
name: GlobalStrides | ||
desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions." | ||
- type: const int* | ||
name: PixelBoxLowerCorner | ||
desc: "[in] Array containing DHW dimensions of lower box corner." | ||
- type: const int* | ||
name: PixelBoxUpperCorner | ||
desc: "[in] Array containing DHW dimensions of upper box corner." | ||
- type: uint32_t | ||
name: ChannelsPerPixel | ||
desc: "[in] Number of channels per pixel." | ||
- type: uint32_t | ||
name: PixelsPerColumn | ||
desc: "[in] Number of pixels per column." | ||
- type: const uint32_t* | ||
name: ElementStrides | ||
desc: "[in] Array containing traversal stride in each of the TensorRank dimensions." | ||
- type: $x_exp_tensor_map_interleave_flags_t | ||
name: Interleave | ||
desc: "[in] Type of interleaved layout the tensor addresses" | ||
- type: $x_exp_tensor_map_swizzle_flags_t | ||
name: Swizzle | ||
desc: "[in] Bank swizzling pattern inside shared memory" | ||
- type: $x_exp_tensor_map_l2_promotion_flags_t | ||
name: L2Promotion | ||
desc: "[in] L2 promotion size." | ||
- type: $x_exp_tensor_map_oob_fill_flags_t | ||
name: OobFill | ||
desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make sense, however this extension is currently only to match the similar CUDA API, so I think it makes more sense to keep it close to the original one.
Currently no other hardware has a need for this, it may change in the future but without knowing the specifics of how other hardware will handle this we can't define a generic interface, so we'd like to have this as an experimental CUDA specific interface, to be revisited if and when other hardware also needs it.
desc: "[in] Handle of the device object." | ||
- type: $x_exp_tensor_map_data_type_flags_t | ||
name: TensorMapType | ||
desc: "[in] Data type of the tensor object." | ||
- type: uint32_t | ||
name: TensorRank | ||
desc: "[in] Dimensionality of tensor; must be at least 3." | ||
- type: void* | ||
name: GlobalAddress | ||
desc: "[in] Starting address of memory region described by tensor." | ||
- type: const uint64_t* | ||
name: GlobalDim | ||
desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions." | ||
- type: const uint64_t* | ||
name: GlobalStrides | ||
desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions." | ||
- type: const uint32_t* | ||
name: BoxDim | ||
desc: "[in] Array containing traversal box size (number of elments) along each of the TensorRank dimensions. Specifies how many elements to be traversed along each tensor dimension." | ||
- type: const uint32_t* | ||
name: ElementStrides | ||
desc: "[in] Array containing traversal stride in each of the TensorRank dimensions." | ||
- type: $x_exp_tensor_map_interleave_flags_t | ||
name: Interleave | ||
desc: "[in] Type of interleaved layout the tensor addresses" | ||
- type: $x_exp_tensor_map_swizzle_flags_t | ||
name: Swizzle | ||
desc: "[in] Bank swizzling pattern inside shared memory" | ||
- type: $x_exp_tensor_map_l2_promotion_flags_t | ||
name: L2Promotion | ||
desc: "[in] L2 promotion size." | ||
- type: $x_exp_tensor_map_oob_fill_flags_t | ||
name: OobFill | ||
desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.
Also, if using properties structs, would it be possible to expose this functionality in a single entry point with pNext chain for the differences?
scripts/core/EXP-TENSOR-MAP.rst
Outdated
Support | ||
-------------------------------------------------------------------------------- | ||
|
||
This is only supported in the CUDA adapter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could add an explicit reference to UR_PLATFORM_BACKEND_CUDA
here for a bit of extra formality
Currently as far as I'm aware no other target has this so this is purely CUDA specific, that's mostly why it isn't implemented for other targets, and also why it aligns closely to the matching CUDA API. |
- Check that TensorDim < 3 using yaml returns: . - Rename some things and remove copypasta
Fixes missing symbol at linking for static build of L0 adapter.
df93280
to
2f5ff27
Compare
The only test failures seem to be a problem with the runner:
|
this was pulled into intel/llvm in intel/llvm#15911 |
Intended to target the APIs here