-
Notifications
You must be signed in to change notification settings - Fork 769
[SYCL][CUDA] tf32 matrix MAD impl using uint32_t #5709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jack.kirk <[email protected]>
buffer<uint32_t, 1> bufB(B, range<1>(K * N)); | ||
buffer<float, 1> bufC(C, range<1>(M * N)); | ||
buffer<float, 1> bufD(D, range<1>(M * N)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a complete example in test/matrix where you show the necessary "manual" conversion function from float to fp19(uint32) during initialization and then from fp19 to float during accumulation and verification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's here: intel/llvm-test-suite#881
for the float to fp19
uint32_t make_tf32(float const &x);
For the fp19 to float:
float tf32_to_fp32(uint32_t x);
(I'll rename both to e.g. make_fp19)
// number of rows of a. | ||
constexpr int K = 8; // number of cols of a/number of rows of b. | ||
|
||
uint32_t A[M * K]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that uint32 is used here as a storage for fp19
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments, I've updated both tests now.
LGTM but we need to start adopting the name tf32 instead of fp19. |
This PR is no longer necessary: The complete tf32 implementation is now ready which can replace this PR: #5870 |
CUDA backend Implementation of tf32 MAD using the underlying 32 bit type, fully consistent with the existing matrix extension.
Integration test added here: intel/llvm-test-suite#881