Skip to content

[UR][CUDA] Add tensor map APIs #1811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 5, 2024
Merged

Conversation

hdelan
Copy link
Contributor

@hdelan hdelan commented Jul 2, 2024

Intended to target the APIs here

@hdelan hdelan requested a review from a team as a code owner July 2, 2024 17:04
@hdelan hdelan marked this pull request as draft July 2, 2024 17:06
@github-actions github-actions bot added loader Loader related feature/bug specification Changes or additions to the specification experimental Experimental feature additions/changes/specification labels Jul 2, 2024
@hdelan hdelan force-pushed the tensormap-exp-api branch 3 times, most recently from cc67afb to 221e4db Compare July 3, 2024 09:17
@github-actions github-actions bot added the cuda CUDA adapter specific issues label Jul 3, 2024
@hdelan hdelan force-pushed the tensormap-exp-api branch 2 times, most recently from 8f63fb0 to ae22a1d Compare July 3, 2024 11:54
@hdelan hdelan marked this pull request as ready for review July 3, 2024 11:55
@hdelan hdelan requested a review from a team as a code owner July 3, 2024 11:55
@hdelan hdelan requested a review from frasercrmck July 3, 2024 11:55
@hdelan hdelan changed the title [draft] Add tensor map APIs [UR][SYCL] Add tensor map APIs Jul 3, 2024
@hdelan hdelan changed the title [UR][SYCL] Add tensor map APIs [UR][CUDA] Add tensor map APIs Jul 3, 2024
@hdelan hdelan force-pushed the tensormap-exp-api branch from ae22a1d to 72935f1 Compare July 4, 2024 09:28
@hdelan
Copy link
Contributor Author

hdelan commented Jul 4, 2024

@frasercrmck have responded to all of your comments I think.

@hdelan hdelan force-pushed the tensormap-exp-api branch 2 times, most recently from 573ba28 to ba8a391 Compare July 4, 2024 16:01
@hdelan hdelan force-pushed the tensormap-exp-api branch from ba8a391 to 309e02f Compare July 15, 2024 10:57
@hdelan hdelan requested a review from frasercrmck July 15, 2024 11:53
@hdelan hdelan requested review from a team as code owners July 15, 2024 14:11
@github-actions github-actions bot added the level-zero L0 adapter specific issues label Jul 15, 2024
@hdelan
Copy link
Contributor Author

hdelan commented Oct 25, 2024

Looks like this is causing some build issues in the L0 static build

Should be fixed. Will keep an eye on CI.

@npmiller npmiller requested review from kbenzie and pbalcer November 5, 2024 15:06
@npmiller
Copy link
Contributor

npmiller commented Nov 5, 2024

ping @oneapi-src/unified-runtime-native-cpu-write @oneapi-src/unified-runtime-opencl-write @oneapi-src/unified-runtime-hip-write @oneapi-src/unified-runtime-level-zero-write

This should be ready to review

Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a general lack of detail in the docs and descriptions in this PR. I honestly couldn't tell you what its for.

The fact this only adds support for CUDA may also be a sticking point.

Comment on lines +114 to +155
- type: $x_exp_tensor_map_data_type_flags_t
name: TensorMapType
desc: "[in] Data type of the tensor object."
- type: uint32_t
name: TensorRank
desc: "[in] Dimensionality of tensor; must be at least 3."
- type: void*
name: GlobalAddress
desc: "[in] Starting address of memory region described by tensor."
- type: const uint64_t*
name: GlobalDim
desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions."
- type: const uint64_t*
name: GlobalStrides
desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions."
- type: const int*
name: PixelBoxLowerCorner
desc: "[in] Array containing DHW dimensions of lower box corner."
- type: const int*
name: PixelBoxUpperCorner
desc: "[in] Array containing DHW dimensions of upper box corner."
- type: uint32_t
name: ChannelsPerPixel
desc: "[in] Number of channels per pixel."
- type: uint32_t
name: PixelsPerColumn
desc: "[in] Number of pixels per column."
- type: const uint32_t*
name: ElementStrides
desc: "[in] Array containing traversal stride in each of the TensorRank dimensions."
- type: $x_exp_tensor_map_interleave_flags_t
name: Interleave
desc: "[in] Type of interleaved layout the tensor addresses"
- type: $x_exp_tensor_map_swizzle_flags_t
name: Swizzle
desc: "[in] Bank swizzling pattern inside shared memory"
- type: $x_exp_tensor_map_l2_promotion_flags_t
name: L2Promotion
desc: "[in] L2 promotion size."
- type: $x_exp_tensor_map_oob_fill_flags_t
name: OobFill
desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense, however this extension is currently only to match the similar CUDA API, so I think it makes more sense to keep it close to the original one.

Currently no other hardware has a need for this, it may change in the future but without knowing the specifics of how other hardware will handle this we can't define a generic interface, so we'd like to have this as an experimental CUDA specific interface, to be revisited if and when other hardware also needs it.

Comment on lines +173 to +206
desc: "[in] Handle of the device object."
- type: $x_exp_tensor_map_data_type_flags_t
name: TensorMapType
desc: "[in] Data type of the tensor object."
- type: uint32_t
name: TensorRank
desc: "[in] Dimensionality of tensor; must be at least 3."
- type: void*
name: GlobalAddress
desc: "[in] Starting address of memory region described by tensor."
- type: const uint64_t*
name: GlobalDim
desc: "[in] Array containing tensor size (number of elements) along each of the TensorRank dimensions."
- type: const uint64_t*
name: GlobalStrides
desc: "[in] Array containing stride size (in bytes) along each of the TensorRank - 1 dimensions."
- type: const uint32_t*
name: BoxDim
desc: "[in] Array containing traversal box size (number of elments) along each of the TensorRank dimensions. Specifies how many elements to be traversed along each tensor dimension."
- type: const uint32_t*
name: ElementStrides
desc: "[in] Array containing traversal stride in each of the TensorRank dimensions."
- type: $x_exp_tensor_map_interleave_flags_t
name: Interleave
desc: "[in] Type of interleaved layout the tensor addresses"
- type: $x_exp_tensor_map_swizzle_flags_t
name: Swizzle
desc: "[in] Bank swizzling pattern inside shared memory"
- type: $x_exp_tensor_map_l2_promotion_flags_t
name: L2Promotion
desc: "[in] L2 promotion size."
- type: $x_exp_tensor_map_oob_fill_flags_t
name: OobFill
desc: "[in] Indicates whether zero or special NaN constant will be used to fill out-of-bounds elements."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of arguments and its not extensible. I think some or all of these should move into a properties struct.

Also, if using properties structs, would it be possible to expose this functionality in a single entry point with pNext chain for the differences?

Support
--------------------------------------------------------------------------------

This is only supported in the CUDA adapter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add an explicit reference to UR_PLATFORM_BACKEND_CUDA here for a bit of extra formality

@npmiller
Copy link
Contributor

The fact this only adds support for CUDA may also be a sticking point.

Currently as far as I'm aware no other target has this so this is purely CUDA specific, that's mostly why it isn't implemented for other targets, and also why it aligns closely to the matching CUDA API.

@alycm alycm added the v0.11.x Include in the v0.11.x release label Nov 26, 2024
@github-actions github-actions bot added the opencl OpenCL adapter specific issues label Dec 3, 2024
@npmiller
Copy link
Contributor

npmiller commented Dec 4, 2024

The only test failures seem to be a problem with the runner:

 urDevicesGet() failed to get number of devices.

@martygrant martygrant merged commit 1851eff into oneapi-src:main Dec 5, 2024
70 of 73 checks passed
@martygrant
Copy link
Contributor

this was pulled into intel/llvm in intel/llvm#15911

npmiller added a commit to npmiller/unified-runtime that referenced this pull request Jan 23, 2025
…api"

This reverts commit 1851eff, reversing
changes made to 5f4a5a2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA adapter specific issues experimental Experimental feature additions/changes/specification hip HIP adapter specific issues level-zero L0 adapter specific issues loader Loader related feature/bug native-cpu Native CPU adapter specific issues opencl OpenCL adapter specific issues specification Changes or additions to the specification v0.11.x Include in the v0.11.x release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants