[SYCL][Matrix] Add support for tf32 type #5920

dkhaldi · 2022-03-29T15:10:30Z

This is a draft PR for initial support of tf32 precision type in joint matrix

Signed-off-by: Dounia <[email protected]>

JackAKirk · 2022-03-30T13:18:23Z

I think that this is implementation is generally fine, and we will change our CUDA backend implementation to use the same interface, e.g. :

joint_matrix<precision::tf32, TM, TK>

etc, as discussed.
The only thing is that we want to be certain that it is better to have storage_element_type as float instead of int. We are implementing/testing the CUDA implementation with the free float_to_tf32 rounding function and double checking that everything is making complete sense.

hdelan · 2022-03-30T15:11:11Z

sycl/test/matrix/matrix-tf32-test.cpp

+static constexpr size_t MATRIX_M = TM * 2;
+static constexpr size_t MATRIX_N = TN * 2;
+static constexpr size_t MATRIX_K = TK * 2;
+precision::tf32 A[MATRIX_M][MATRIX_K];


Hi @dkhaldi . I'm a little bit confused by this testcase. In the other file you say

Users can't construct a tf32

But in this file you construct them and convert floats to be stored as tf32s. An empty class has size 1 byte so they cannot contain a tf32. Is there something I'm missing here? Thanks

In https://github.com/intel/llvm/pull/5870/files#diff-34520e7c212ec666342a649b7448a9841062a5e93eca4958990ac1653568f0d5R82
I am using the free functions defined in https://github.com/intel/llvm/pull/5870/files#diff-f71a436bdeda598b29caad471fa637a2844a12f38fe4e85b15b2ccb37bd09833R607 which accept and return floats, since floats are to be used as the fragment type for tf32.

Perhaps this is suitable instead of making the function return the empty class.

@hdelan, thanks, I will correct that. I made it this way so I don't change the load API for this draft PR. I will change the load API and keep the buffers as floats.
Yes the expectation is to use the free conversion function using the element indexing as argument to perform the conversion. The only issue is that we should be using get_wi_data function to get WI portion as it exhibits better semantics than just the member "data" that can confuse the user thinking is this the SG matrix or the WI portion.

keryell

Interesting.
You can simplify the code a little bit.

keryell · 2022-03-31T21:06:59Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit.hpp

+
+// Differentiating between the "element type" and the "storage element type"
+template <typename T> struct helper_traits {
+  typedef T element_type;


Use modern using instead.

keryell · 2022-03-31T21:08:20Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit.hpp

@@ -277,12 +306,17 @@ class wi_element {
  }

 #if __SYCL_DEVICE_ONLY__
+  // TODO: __spirv_VectorInsertDynamic should take storage element type as
+  // argument
 #define OP(op)                                                                 \


Perhaps it is a good opportunity to remove all these macros?

@yubingex007-a11y, I remember you change this code to use macros and make it more compact.
The code was before expanded for each of the ops. Bing changed it to remove the redundancy.
@keryell what do you suggest we should use instead?

Sometimes macros are the best or only reasonable solution.
In that case use protected names like __DPC_SYCL_OP or whatever to avoid the case where a user decides to use in her program:

#define OP something

:-)

keryell · 2022-03-31T21:09:50Z

sycl/test/matrix/matrix-tf32-test.cpp

@@ -0,0 +1,165 @@
+// RUN: %clangxx -fsycl -O2 %s -o %t.out
+
+#include <CL/sycl.hpp>


Suggested change

#include <CL/sycl.hpp>

#include <sycl/sycl.hpp>

keryell · 2022-03-31T21:10:50Z

sycl/test/matrix/matrix-tf32-test.cpp

+using namespace sycl;
+using namespace sycl::ext::oneapi::experimental::matrix;
+
+#define SG_SZ 8


I have a macro indigestion. :-) Please use auto constexpr for example

some of these tests are used for performance evaluation. If we use constexpr, SG size and the other parameters cannot be tuned at compilation time using -D

That is a good use case. But I guess in that case you would have a #ifndef SG_SZ around this.

keryell · 2022-03-31T21:12:53Z

sycl/test/matrix/matrix-tf32-test.cpp

+
+     cgh.parallel_for<class imatrix>(
+         nd_range<2>({NDRangeM, NDRangeN * SG_SZ}, {1, 1 * SG_SZ}),
+         [accA, accB, accC, M, N, K](nd_item<2> spmd_item)


Too verbose. At the end you need to motivate SYCL is simpler than CUDA, not the opposite. ;-)

Suggested change

[accA, accB, accC, M, N, K](nd_item<2> spmd_item)

[=](nd_item<2> spmd_item)

keryell · 2022-03-31T21:15:07Z

sycl/test/matrix/matrix-tf32-test.cpp

+  for (int m = 0; m < M; m++)
+    for (int n = 0; n < N; n++) {
+      for (int k = 0; k < K; k++) {
+        float va = *(float *)(A_mem + m * K + k);


Suggested change

float va = *(float *)(A_mem + m * K + k);

auto va = A_mem[m * K + k];

keryell · 2022-03-31T21:15:35Z

sycl/test/matrix/matrix-tf32-test.cpp

+        float va = *(float *)(A_mem + m * K + k);
+        float vb = *(float *)(B_mem + k * N + n);
+        float acc = *((float *)(C_mem + m * N + n));
+        *((float *)(C_mem + m * N + n)) = va * vb;


Suggested change

*((float *)(C_mem + m * N + n)) = va * vb;

C_mem[m * N + n] = va * vb;

keryell · 2022-03-31T21:17:41Z

sycl/test/matrix/matrix-tf32-test.cpp

+    }
+  }
+
+  big_matrix<float, MATRIX_M, MATRIX_N> MC((float *)&C);


Suggested change

big_matrix<float, MATRIX_M, MATRIX_N> MC((float *)&C);

big_matrix<float, MATRIX_M, MATRIX_N> MC { C };

and add the right constructor in big_matrix

… type fp32 instead of tf32

dkhaldi · 2022-08-19T21:27:56Z

@yubingex007-a11y can you please review?
Mainly, are these the complete set of changes we need for tf32 without breaking existing types? Of course SPIRV has to change as well to accommodate the signature changes of load/store, insert and extract dynamic.

MrSidims · 2022-08-22T10:58:47Z

sycl/include/sycl/ext/oneapi/matrix/matrix-jit.hpp

+// just uses the type system to communicate the desired accuracy of arithmetic
+// computations. Users can't construct a tf32
+namespace precision {
+class tf32 {};


since users shouldn't construct a tf32.

Suggested change

class tf32 {};

class tf32 {

tf32() = delete;

};

dkhaldi · 2023-01-30T17:42:11Z

Replaced with https://github.com/intel/llvm/pull/8151/files that uses the unified interface

dkhaldi added 2 commits March 29, 2022 08:07

[SYCL][Matrix] Add support for tf32 type

9b6ed22

Signed-off-by: Dounia <[email protected]>

[SYCL][Matrix] Add a comment about conversion function

016523f

dkhaldi mentioned this pull request Mar 29, 2022

[SYCL][ext][CUDA] Use float as storage type for tf32 joint matrix #5870

Merged

[SYCL][Matrix] minor formatting

108f04a

dkhaldi requested review from yubingex007-a11y, mbelicki, MrSidims and JackAKirk March 29, 2022 15:20

hdelan reviewed Mar 30, 2022

View reviewed changes

keryell reviewed Mar 31, 2022

View reviewed changes

dkhaldi added 2 commits June 7, 2022 13:21

tf32 cannot be constructed, change load,store, and slicing signatures

98480ab

Change the signatures of extract and insert dynamic to return storage…

4d7c661

… type fp32 instead of tf32

MrSidims reviewed Aug 22, 2022

View reviewed changes

dkhaldi added 4 commits August 22, 2022 11:15

make it illegal to construct tf32 class type

e349b71

formatting

a910a25

update branch

682dff9

Merge remote-tracking branch 'intel_llvm/sycl' into tf32

41d07cd

JackAKirk mentioned this pull request Oct 5, 2022

[SYCL][Matrix] Fix __spirv_JointMatrixINTEL signature #6957

Merged

dkhaldi closed this Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][Matrix] Add support for tf32 type #5920

[SYCL][Matrix] Add support for tf32 type #5920

dkhaldi commented Mar 29, 2022

JackAKirk commented Mar 30, 2022

hdelan Mar 30, 2022 •

edited

Loading

hdelan Mar 31, 2022

dkhaldi Apr 12, 2022

keryell left a comment

keryell Mar 31, 2022

keryell Mar 31, 2022

dkhaldi Jun 6, 2022

keryell Jun 7, 2022

keryell Mar 31, 2022

keryell Mar 31, 2022

dkhaldi Jun 6, 2022

keryell Jun 7, 2022

keryell Mar 31, 2022

keryell Mar 31, 2022

keryell Mar 31, 2022

keryell Mar 31, 2022

dkhaldi commented Aug 19, 2022

MrSidims Aug 22, 2022 •

edited

Loading

dkhaldi commented Jan 30, 2023

		@@ -0,0 +1,165 @@
		// RUN: %clangxx -fsycl -O2 %s -o %t.out

		#include <CL/sycl.hpp>

	[accA, accB, accC, M, N, K](nd_item<2> spmd_item)
	[=](nd_item<2> spmd_item)

	float va = (float )(A_mem + m * K + k);
	auto va = A_mem[m * K + k];

	((float )(C_mem + m * N + n)) = va * vb;
	C_mem[m * N + n] = va * vb;

	big_matrix<float, MATRIX_M, MATRIX_N> MC((float *)&C);
	big_matrix<float, MATRIX_M, MATRIX_N> MC { C };

[SYCL][Matrix] Add support for tf32 type #5920

[SYCL][Matrix] Add support for tf32 type #5920

Conversation

dkhaldi commented Mar 29, 2022

JackAKirk commented Mar 30, 2022

hdelan Mar 30, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keryell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dkhaldi commented Aug 19, 2022

MrSidims Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

dkhaldi commented Jan 30, 2023

hdelan Mar 30, 2022 •

edited

Loading

MrSidims Aug 22, 2022 •

edited

Loading