-
Notifications
You must be signed in to change notification settings - Fork 131
[SYCL][CUDA][Matrix] Adding test case for tf32 #963
Conversation
@@ -163,6 +163,14 @@ void test() { | |||
accB.get_pointer() + (k * K * Big_N) + (n * N), | |||
Big_N); | |||
|
|||
// Convert values if using tf32 | |||
if constexpr (std::is_same<T3, precision::tf32>::value) { | |||
for (auto i = 0; i < 4; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment on where 4 comes from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be OK to merge this test as it is: then I can merge it into #975 and use the element wise interface to address your comment: adding a comment on where 4 comes from would then not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test will have to change a lot after intel/llvm#5964
you are not using data anymore.
So you want to merge it first as it is and then modify it in 975?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically it will be more simple if intel/llvm#5870 (and this corresponding test PR) is merged before intel/llvm#5964 instead of vice versa: Then the tf32 stuff can be updated in intel/llvm#5964: as you say a fair amount will change but it is straightforward for me to do it. @hdelan is on holiday atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What bothers me is that this PR is what triggered work on wi_marray and slicing additions. So it does not seem right to merge it before the solution.
Is there a reason why you want to merge it before intel/llvm#5964?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR doesn't change the way that element wise ops could be performed in current sycl tip, it is just adding a new free tf32 rounding function and tf32 joint_matrix implementation. I don't think there is anything implicitly wrong with it being merged before #5964. I think It is easier to merge this one first because @hdelan is currently on holiday anyway and I know how to adjust #5964 to take account if these tf32 changes easily. Probably intel/llvm#5964 will be ready to be merged then as well anyway by the sound of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the PR that adds the actual tf32 class merged already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No but intel/llvm#5870 is approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we agreed on addressing the spec changes in a separate PR (intel/llvm#5870 (review))
In this case, this test can also be approved and merged. Thank you for the clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
) Changing joint_matrix impl to use float as storage type instead of uint32_t for tf32. Test: intel/llvm-test-suite#963
Test for intel/llvm#5870