-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][ESIMD] Add support for lsc mem access APIs #5512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Sergey Dmitriev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one is good to go, given that it has been internally reviewed as well.
But please follow up with another PR addressing some documentation-related comments.
@@ -176,11 +176,279 @@ enum class atomic_op : uint8_t { | |||
/// Compare and exchange (floating point). | |||
/// <code>if (*addr == src0) *addr = src1;</code> | |||
fcmpwr = 0x12, | |||
fadd = 0x13, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow up with a PR adding comments to these new operations.
/// Data size or format to read or store | ||
enum class lsc_data_size : uint8_t { | ||
default_size = 0, | ||
u8 = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add documentation comments to undocumented elements in the follow-up PR.
vals.data()); | ||
} | ||
|
||
/// 2D flat-address block load. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some usage examples are highly desirable for this and few other lsc APIs. Please add a TODO marker in the follow-up PR.
@bader, @vladimirlaz - CUDA testing stalled. This patch does can't affect CUDA - could you please merge? |
check_lsc_data_size<T, DS>(); | ||
if (DS != lsc_data_size::default_size) | ||
return DS; | ||
else if (sizeof(T) == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually U8 and U16 data size values are only supported by 2d block messages. D32U8 and D32U16 should be used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has been fixed in f922b01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the fixed code will not work. When sizeof(T)
is 1 or 2, values should be also bit-casted to uint32_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the updated patch. I think this problem has been fixed.
/// @tparam BlockHeight is the block height in number of elements. | ||
/// @tparam NBlocks is the number of blocks. | ||
/// @tparam Transposed is the transposed version or not. | ||
/// @tparam Transformed is apply VNNI transform or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VNNI transform is only supported when data size is U8 or U16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added static asserts to check supported data sizes in e9e39f7
/// @tparam BlockWidth is the block width in number of elements. | ||
/// @tparam BlockHeight is the block height in number of elements. | ||
/// @tparam NBlocks is the number of blocks. | ||
/// @tparam Transposed is the transposed version or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transposed messages are only supported for U32 and U64 (with some constraints)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added static asserts to check supported data sizes when Transposed is true in e9e39f7
I need to take this request back - few more comments arrived. |
… with regular atomics
/// | ||
/// @tparam T is element type. | ||
/// @tparam NElts is the number of elements to load per address. | ||
/// @tparam DS is the data size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the Transposed (block) messages support only u32
and u64
values for DS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added static assert to all Transposed functions in f922b01
The precommit failures is unrelated - timeouts on two tests: Probably caused by enormous amount of warning messages. |
Signed-off-by: Sergey Dmitriev [email protected]