Skip to content

[SYCL][ESIMD] Add support for lsc mem access APIs #5512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Mar 2, 2022

Conversation

sndmitriev
Copy link
Contributor

Signed-off-by: Sergey Dmitriev [email protected]

@sndmitriev sndmitriev requested a review from a team as a code owner February 8, 2022 15:03
kbobrovs
kbobrovs previously approved these changes Feb 8, 2022
Copy link
Contributor

@kbobrovs kbobrovs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is good to go, given that it has been internally reviewed as well.
But please follow up with another PR addressing some documentation-related comments.

@@ -176,11 +176,279 @@ enum class atomic_op : uint8_t {
/// Compare and exchange (floating point).
/// <code>if (*addr == src0) *addr = src1;</code>
fcmpwr = 0x12,
fadd = 0x13,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow up with a PR adding comments to these new operations.

/// Data size or format to read or store
enum class lsc_data_size : uint8_t {
default_size = 0,
u8 = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add documentation comments to undocumented elements in the follow-up PR.

vals.data());
}

/// 2D flat-address block load.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some usage examples are highly desirable for this and few other lsc APIs. Please add a TODO marker in the follow-up PR.

@kbobrovs
Copy link
Contributor

kbobrovs commented Feb 9, 2022

@bader, @vladimirlaz - CUDA testing stalled. This patch does can't affect CUDA - could you please merge?

check_lsc_data_size<T, DS>();
if (DS != lsc_data_size::default_size)
return DS;
else if (sizeof(T) == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually U8 and U16 data size values are only supported by 2d block messages. D32U8 and D32U16 should be used instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has been fixed in f922b01

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably the fixed code will not work. When sizeof(T) is 1 or 2, values should be also bit-casted to uint32_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the updated patch. I think this problem has been fixed.

/// @tparam BlockHeight is the block height in number of elements.
/// @tparam NBlocks is the number of blocks.
/// @tparam Transposed is the transposed version or not.
/// @tparam Transformed is apply VNNI transform or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VNNI transform is only supported when data size is U8 or U16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added static asserts to check supported data sizes in e9e39f7

/// @tparam BlockWidth is the block width in number of elements.
/// @tparam BlockHeight is the block height in number of elements.
/// @tparam NBlocks is the number of blocks.
/// @tparam Transposed is the transposed version or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transposed messages are only supported for U32 and U64 (with some constraints)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added static asserts to check supported data sizes when Transposed is true in e9e39f7

@kbobrovs
Copy link
Contributor

kbobrovs commented Feb 9, 2022

@bader, @vladimirlaz - CUDA testing stalled. This patch does can't affect CUDA - could you please merge?

I need to take this request back - few more comments arrived.

///
/// @tparam T is element type.
/// @tparam NElts is the number of elements to load per address.
/// @tparam DS is the data size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the Transposed (block) messages support only u32 and u64 values for DS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added static assert to all Transposed functions in f922b01

@sndmitriev sndmitriev marked this pull request as draft February 17, 2022 09:22
@sndmitriev sndmitriev marked this pull request as ready for review February 25, 2022 15:22
@kbobrovs
Copy link
Contributor

kbobrovs commented Mar 2, 2022

The precommit failures is unrelated - timeouts on two tests:
AtomicRef/sub.cpp
AtomicRef/add.cpp

Probably caused by enormous amount of warning messages.
So, I'm merging the patch.

@kbobrovs kbobrovs merged commit 4bd50e7 into intel:sycl Mar 2, 2022
@sndmitriev sndmitriev deleted the sndmitriev/esimd-lsc-api branch March 2, 2022 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants