[In Progress] Enable mx f8 fp4 support on ROCm #2046

jagadish-amd · 2025-04-23T06:38:11Z

This PR enables mx data type support on ROCm.

fp8 mx data type sample test case.
PYTORCH_TEST_WITH_ROCM=1 python test/test_matmul_cuda.py TestFP8MatmulCudaCUDA.test_blockwise_mxfp8_nvfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp8_cuda -v

HipblasLT log hipblaslt-bench --api_method c -m 128 -n 128 -k 128 --lda 128 --ldb 128 --ldc 128 --ldd 128 --stride_a 0 --stride_b 0 --stride_c 0 --stride_d 0 --alpha 1 --beta 0 --transA T --transB N --batch_count 1 --scaleA 3 --scaleB 3 --a_type f8_r --b_type f8_r --c_type bf16_r --d_type bf16_r --compute_type f32_r --algo_method index --solution_index -2147220478 --rotating 0 --cold_iters 0 --iters 0

fp4 mx data type sample test case. TBD

Commits:

ROCm MX-FP8 Gemm (PR from @petrex )
Ported the patch from ROCm MX-FP8 Gemm pytorch/pytorch#147553
Commented few lines to avoid compilation error. (check for todo comments)
Refine _platform_supports_mx_gemm check
For mx fp8, A and B need not be kFloat8_e8m0fnu type

Ported the patch from pytorch#147553 Commented few lines to avoid compilation error. (check for todo comments) Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

rocm-repo-management-api · 2025-04-23T07:28:57Z

Jenkins build for 1b8bf596817e0403f7038b90b5c656aa30b6df82 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during base docker image building:

#31 11.26 The following packages have unmet dependencies:
#31 11.32  rocm-dev : Depends: rocm-cmake (= 0.14.0.60304-76~22.04) but 5.0.0-1 is to be installed
#31 11.32             Depends: rocm-device-libs (= 1.0.0.60304-76~22.04) but 5.0.0-1 is to be installed
#31 11.32  rocm-utils : Depends: rocminfo (= 1.0.0.60304-76~22.04) but 5.0.0-1 is to be installed
#31 11.32               Depends: rocm-cmake (= 0.14.0.60304-76~22.04) but 5.0.0-1 is to be installed
#31 11.33 E: Unable to correct problems, you have held broken packages.
#31 ERROR: process "/bin/sh -c bash ./install_rocm.sh" did not complete successfully: exit code: 100
------
 > [stage-0 23/61] RUN bash ./install_rocm.sh:
11.26 distribution that some required packages have not yet been created
11.26 or been moved out of Incoming.

petrex

thanks @jagadish-amd !
left few comments , lets discuss offline as well

petrex · 2025-04-24T18:20:05Z

aten/src/ATen/cuda/CUDABlas.cpp

@@ -1566,6 +1566,25 @@ void scaled_gemm(
    matmulDescA = HIPBLASLT_MATMUL_DESC_A_SCALE_POINTER_VEC_EXT;
    matmulDescB = HIPBLASLT_MATMUL_DESC_B_SCALE_POINTER_VEC_EXT;
  }
+    else if(mat1_scale_dtype == kFloat8_e8m0fnu && mat2_scale_dtype == kFloat8_e8m0fnu) {


Do we support an usage like this in hipblaslt? : mx- format * non-mx-format

if yes this should be || instead of &&

petrex · 2025-04-24T18:22:33Z

aten/src/ATen/cuda/CUDABlas.cpp

@@ -1566,6 +1566,25 @@ void scaled_gemm(
    matmulDescA = HIPBLASLT_MATMUL_DESC_A_SCALE_POINTER_VEC_EXT;
    matmulDescB = HIPBLASLT_MATMUL_DESC_B_SCALE_POINTER_VEC_EXT;
  }
+    else if(mat1_scale_dtype == kFloat8_e8m0fnu && mat2_scale_dtype == kFloat8_e8m0fnu) {
+    #if ROCM_VERSION >= 60500
+          if (at::cuda::tunable::IsGfx950Device()) {


how about this:

bool _is_gfx950_supported() { #if ROCM_VERSION >= 60500 return at::detail::getCUDAHooks().isGPUArch({"gfx950"}); #else return false; #endif }

maybe slightly better perf than device query...

petrex · 2025-04-24T18:24:16Z

aten/src/ATen/cuda/CUDABlas.cpp

+                       "Matrix dimensions must be multiples of 32 for MX format. ",
+                       "Got m=", m, ", n=", n, ", k=", k);
+
+           //todo


hipblaslt provided this APIs but I guess we don't need to set this explicitly , at least for gfx950

petrex · 2025-04-24T18:24:51Z

aten/src/ATen/cuda/tunable/GemmHipblaslt.h

@@ -513,7 +515,24 @@ class HipblasltGemmOp : public Callable<ParamsT> {
      if (mat1_scale_ptr && mat2_scale_ptr) {
 #ifdef HIPBLASLT_VEC_EXT
        if (GetUseRowwiseFromParams<CT>(params)) {
-          // swapped
+          // For MX-FP8 on gfx950
+#if ROCM_VERSION >= 60500


seems to be duplicate logic

petrex · 2025-04-24T18:26:55Z

aten/src/ATen/cuda/tunable/GemmMxUtils.h

+namespace at::cuda::tunable {
+
+#ifdef USE_ROCM
+static bool IsGfx950Device() {


how about

bool _is_gfx950_supported() { #if ROCM_VERSION >= 60500 return at::detail::getCUDAHooks().isGPUArch({"gfx950"}); #else return false; #endif }

instead of device query
and maybe evaluate once only

petrex · 2025-04-24T18:28:27Z

aten/src/ATen/cuda/tunable/GemmMxUtils.h

+#endif
+
+// Helper function to validate MX format requirements
+static bool ValidateMXFormatRequirements(int64_t m, int64_t n, int64_t k) {


This is a basic check but I think it should be ok for now. looking into hipblaslt implementation there seem to be few other shapes

petrex · 2025-04-24T18:29:30Z

torch/utils/hipify/cuda_to_hip_mappings.py

@@ -7339,6 +7339,9 @@
        ("CUBLASLT_MATMUL_DESC_D_SCALE_POINTER", ("HIPBLASLT_MATMUL_DESC_D_SCALE_POINTER", CONV_MATH_FUNC, API_BLAS)),
        ("CUBLASLT_MATMUL_DESC_AMAX_D_POINTER", ("HIPBLASLT_MATMUL_DESC_AMAX_D_POINTER", CONV_MATH_FUNC, API_BLAS)),
        ("CUBLASLT_MATMUL_DESC_BIAS_DATA_TYPE", ("HIPBLASLT_MATMUL_DESC_BIAS_DATA_TYPE", CONV_MATH_FUNC, API_BLAS)),
+        ("CUBLASLT_MATMUL_DESC_A_SCALE_MODE", ("HIPBLASLT_MATMUL_DESC_A_SCALE_MODE", CONV_MATH_FUNC, API_BLAS)),
+        ("CUBLASLT_MATMUL_DESC_B_SCALE_MODE", ("HIPBLASLT_MATMUL_DESC_B_SCALE_MODE", CONV_MATH_FUNC, API_BLAS)),
+        ("CUBLASLT_MATMUL_MATRIX_SCALE_VEC32_UE8M0", ("HIPBLASLT_MATMUL_MATRIX_SCALE_VEC32_UE8M0", CONV_MATH_FUNC, API_BLAS)),


lets focus on mx-fp8 in this PR. for mx-fp4 we have other mappings.

rocm-repo-management-api · 2025-04-24T21:35:05Z

Jenkins build for 1b8bf596817e0403f7038b90b5c656aa30b6df82 commit is in progress
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-04-25T04:35:48Z

Jenkins build for 1b8bf596817e0403f7038b90b5c656aa30b6df82 commit is in progress
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-04-25T05:35:57Z

Jenkins build for 1b8bf596817e0403f7038b90b5c656aa30b6df82 commit is in progress
Links: Blue Ocean view / Build artifacts

jagadish-amd added 3 commits April 22, 2025 23:17

ROCm MX-FP8 Gemm

a264d06

Ported the patch from pytorch#147553 Commented few lines to avoid compilation error. (check for todo comments) Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

Refine _platform_supports_mx_gemm check

5219a2b

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

For mx fp8, A and B need not be kFloat8_e8m0fnu type

1b8bf59

Signed-off-by: Jagadish Krishnamoorthy <[email protected]>

jagadish-amd requested review from jeffdaily and jithunnair-amd as code owners April 23, 2025 06:38

jagadish-amd requested review from pruthvistony and petrex April 23, 2025 06:38

jagadish-amd changed the title ~~Enable mx f8 fp4 support on ROCm~~ [In Progress] Enable mx f8 fp4 support on ROCm Apr 23, 2025

petrex requested changes Apr 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[In Progress] Enable mx f8 fp4 support on ROCm #2046

[In Progress] Enable mx f8 fp4 support on ROCm #2046

jagadish-amd commented Apr 23, 2025 •

edited

Loading

rocm-repo-management-api bot commented Apr 23, 2025 •

edited

Loading

petrex left a comment •

edited

Loading

petrex Apr 24, 2025 •

edited

Loading

petrex Apr 24, 2025

petrex Apr 24, 2025

petrex Apr 24, 2025

petrex Apr 24, 2025

petrex Apr 24, 2025

petrex Apr 24, 2025

rocm-repo-management-api bot commented Apr 24, 2025

rocm-repo-management-api bot commented Apr 25, 2025

rocm-repo-management-api bot commented Apr 25, 2025

[In Progress] Enable mx f8 fp4 support on ROCm #2046

Are you sure you want to change the base?

[In Progress] Enable mx f8 fp4 support on ROCm #2046

Conversation

jagadish-amd commented Apr 23, 2025 • edited Loading

rocm-repo-management-api bot commented Apr 23, 2025 • edited Loading

petrex left a comment • edited Loading

Choose a reason for hiding this comment

petrex Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

petrex Apr 24, 2025

Choose a reason for hiding this comment

rocm-repo-management-api bot commented Apr 24, 2025

rocm-repo-management-api bot commented Apr 25, 2025

rocm-repo-management-api bot commented Apr 25, 2025

jagadish-amd commented Apr 23, 2025 •

edited

Loading

rocm-repo-management-api bot commented Apr 23, 2025 •

edited

Loading

petrex left a comment •

edited

Loading

petrex Apr 24, 2025 •

edited

Loading