Skip to content

Merge CK validation to release/2.6 #2016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: release/2.6
Choose a base branch
from

Conversation

akashveramd
Copy link

@akashveramd akashveramd commented Apr 3, 2025

Refiled version of #2007

These are the changes made for this PR-

Removed generating and building CK kernels when building pytorch.
Download CK library from compute-artifactory and link with pytorch.
Enabled USE_CK_FLASH_ATTENTION based on USE_FLASH_ATTENTION option.
Added USE_CK_FLASH_ATTENTION as a cmake variable
Fixed lint/NIT errors using lintrunner.

Steps for validating CK library-

  • Build pytorch using PYTORCH_ROCM_ARCH=gfx90a python setup.py develop
    Note: I used MI-200, hence, used gfx90a.
  • In benchmarks/transformer/sdpa.py file, in the very first line of the main(), add torch.backends.cuda.preferred_rocm_fa_library("ck"). This will set CK as backend.
  • In the terminal, run python benchmarks/transformer/sdpa.py
    As a proof for validation the sdpa.py should successfully run to completion.

…eate link target. Enable USE_CK_FLASH_ATTENTION based on USE_FLASH_ATTENTION option.
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 3, 2025

Jenkins build for 6943678cdc3d79c37906161e88df783828f8d403 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

	/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so
	/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so
[7787/7893] Building CXX object functorch/CMakeFiles/functorch.dir/csrc/init_dim_only.cpp.o
cc1plus: warning: command-line option ‘-Wno-duplicate-decl-specifier’ is valid for C/ObjC but not for C++
[7788/7893] Linking CXX executable bin/cpu_profiling_allocator_test
FAILED: bin/cpu_profiling_allocator_test 
: && /opt/cache/bin/c++ -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--no-as-needed caffe2/CMakeFiles/cpu_profiling_allocator_test.dir/__/aten/src/ATen/test/cpu_profiling_allocator_test.cpp.o -o bin/cpu_profiling_allocator_test -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/opt/conda/envs/py_3.10/lib:/var/lib/jenkins/pytorch/build/lib:/opt/rocm-6.3.4/lib:/opt/rocm/lib:  lib/libgtest_main.a  -lstdc++  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  /opt/conda/envs/py_3.10/lib/libmkl_intel_lp64.so  /opt/conda/envs/py_3.10/lib/libmkl_gnu_thread.so  /opt/conda/envs/py_3.10/lib/libmkl_core.so  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.a  -lm  /usr/lib/x86_64-linux-gnu/libdl.a  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so" -Wl,--as-needed  lib/libc10_hip.so  lib/libc10.so  /opt/rocm-6.3.4/lib/libMIOpen.so.1.0.60304  /opt/rocm/lib/libhiprtc.so.6.3.60304  -ldl  /opt/rocm-6.3.4/lib/libhipblas.so.2.3.60304  /opt/rocm-6.3.4/lib/libhipfft.so.0.1.60304  /opt/rocm-6.3.4/lib/libhiprand.so.1.1.60304  /opt/rocm-6.3.4/lib/librocrand.so.1.1.60304  /opt/rocm-6.3.4/lib/libhipsparse.so.1.1.0.60304  /opt/rocm-6.3.4/lib/libhipsolver.so.0.3.60304  /opt/rocm-6.3.4/lib/libhipblaslt.so.0.10.60304  /opt/rocm/lib/libamdhip64.so.6.3.60304  lib/libgtest.a  -Wl,-rpath-link,/opt/rocm-6.3.4/lib && /opt/conda/envs/py_3.10/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/cpu_profiling_allocator_test && :
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_bwd(fmha_bwd_traits, fmha_bwd_args, ck_tile::stream_config const&)'
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_fwd(fmha_fwd_traits, fmha_fwd_args, ck_tile::stream_config const&)'
collect2: error: ld returned 1 exit status
[7789/7893] Linking CXX executable bin/reduce_ops_test

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 3, 2025

Jenkins build for 6943678cdc3d79c37906161e88df783828f8d403 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7796/7893] Linking CXX shared library lib/libtorch.so
Warning: Unused direct dependencies:
	/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so
	/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so
[7797/7893] Linking CXX executable bin/native_test
FAILED: bin/native_test 
: && /opt/cache/bin/c++ -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--no-as-needed caffe2/CMakeFiles/native_test.dir/__/aten/src/ATen/test/native_test.cpp.o -o bin/native_test -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/opt/conda/envs/py_3.10/lib:/var/lib/jenkins/pytorch/build/lib:/opt/rocm-6.3.4/lib:/opt/rocm/lib:  lib/libgtest_main.a  -lstdc++  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  /opt/conda/envs/py_3.10/lib/libmkl_intel_lp64.so  /opt/conda/envs/py_3.10/lib/libmkl_gnu_thread.so  /opt/conda/envs/py_3.10/lib/libmkl_core.so  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.a  -lm  /usr/lib/x86_64-linux-gnu/libdl.a  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so" -Wl,--as-needed  lib/libc10_hip.so  lib/libc10.so  /opt/rocm-6.3.4/lib/libMIOpen.so.1.0.60304  /opt/rocm/lib/libhiprtc.so.6.3.60304  -ldl  /opt/rocm-6.3.4/lib/libhipblas.so.2.3.60304  /opt/rocm-6.3.4/lib/libhipfft.so.0.1.60304  /opt/rocm-6.3.4/lib/libhiprand.so.1.1.60304  /opt/rocm-6.3.4/lib/librocrand.so.1.1.60304  /opt/rocm-6.3.4/lib/libhipsparse.so.1.1.0.60304  /opt/rocm-6.3.4/lib/libhipsolver.so.0.3.60304  /opt/rocm-6.3.4/lib/libhipblaslt.so.0.10.60304  /opt/rocm/lib/libamdhip64.so.6.3.60304  lib/libgtest.a  -Wl,-rpath-link,/opt/rocm-6.3.4/lib && /opt/conda/envs/py_3.10/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/native_test && :
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_bwd(fmha_bwd_traits, fmha_bwd_args, ck_tile::stream_config const&)'
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_fwd(fmha_fwd_traits, fmha_fwd_args, ck_tile::stream_config const&)'
collect2: error: ld returned 1 exit status
[7798/7893] Linking CXX executable bin/Dimname_test

include(${CMAKE_SOURCE_DIR}/cmake/public/LoadHIP.cmake)

# full path for CK library on compute-artifactory.amd.com
set(url "https://compute-artifactory.amd.com/artifactory/rocm-generic-local")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider reuse the release page of https://github.com/ROCm/composable_kernel
(There is no need to create new releases, just re-use any existing release to store the assets)

Copy link
Collaborator

@jithunnair-amd jithunnair-amd Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't think https://github.com/ROCm/composable_kernel is an option, since we don't own that repo and I don't think it'd be appropriate to store these artifacts on their repo, given that they are generated using scripts that are not even in their repo.
Alternately, we could instead create releases on https://github.com/ROCm/CK_kernels, since we own that repo.
@pruthvistony, can we file an OSRB request for CK_kernels repo (if we want to use the same solution for upstream)?

CMakeLists.txt Outdated
@@ -888,6 +895,13 @@ if(USE_ROCM)
endif()
endif()

# CK shared lib linkage
if(USE_ROCM)
if(UNIX AND (USE_CK_FLASH_ATTENTION))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the UNIX part since USE_ROCM is defined as cmake_dependent_option(USE_ROCM "Use ROCm" ON "LINUX" OFF)?
Also, please correct indentation

Copy link
Author

@akashveramd akashveramd Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aotriton.cmake is getting linked similarly using condition if(UNIX AND (USE_FLASH_ATTENTION...
I borrowed the logic from there and linked in a similar way.
Will correct the indentation and the spacing though.

include(${CMAKE_SOURCE_DIR}/cmake/public/LoadHIP.cmake)

# full path for CK library on compute-artifactory.amd.com
set(url "https://compute-artifactory.amd.com/artifactory/rocm-generic-local")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pruthvistony @jeffdaily Should this be a concern about putting artifactory links in a public repo?

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 4, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7796/7893] Linking CXX shared library lib/libtorch.so
Warning: Unused direct dependencies:
	/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so
	/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so
[7797/7893] Linking CXX executable bin/static_runtime_bench
FAILED: bin/static_runtime_bench 
: && /opt/cache/bin/c++ -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic     -Wl,--no-as-needed caffe2/CMakeFiles/static_runtime_bench.dir/__/benchmarks/static_runtime/deep_wide_pt.cc.o caffe2/CMakeFiles/static_runtime_bench.dir/__/benchmarks/static_runtime/deep_wide_pt_bench.cc.o -o bin/static_runtime_bench -L/lib/intel64   -L/lib/intel64_win   -L/lib/win-x64 -Wl,-rpath,/lib/intel64:/lib/intel64_win:/lib/win-x64:/opt/conda/envs/py_3.10/lib:/var/lib/jenkins/pytorch/build/lib:/opt/rocm-6.3.4/lib:/opt/rocm/lib  lib/libbenchmark.a  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  /opt/conda/envs/py_3.10/lib/libmkl_intel_lp64.so  /opt/conda/envs/py_3.10/lib/libmkl_gnu_thread.so  /opt/conda/envs/py_3.10/lib/libmkl_core.so  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.a  -lm  /usr/lib/x86_64-linux-gnu/libdl.a  -Wl,--no-as-needed,"/var/lib/jenkins/pytorch/build/lib/libtorch_hip.so" -Wl,--as-needed  lib/libc10_hip.so  lib/libc10.so  /opt/rocm-6.3.4/lib/libMIOpen.so.1.0.60304  /opt/rocm/lib/libhiprtc.so.6.3.60304  -ldl  /opt/rocm-6.3.4/lib/libhipblas.so.2.3.60304  /opt/rocm-6.3.4/lib/libhipfft.so.0.1.60304  /opt/rocm-6.3.4/lib/libhiprand.so.1.1.60304  /opt/rocm-6.3.4/lib/librocrand.so.1.1.60304  /opt/rocm-6.3.4/lib/libhipsparse.so.1.1.0.60304  /opt/rocm-6.3.4/lib/libhipsolver.so.0.3.60304  /opt/rocm-6.3.4/lib/libhipblaslt.so.0.10.60304  /opt/rocm/lib/libamdhip64.so.6.3.60304  -lrt  -Wl,-rpath-link,/opt/rocm-6.3.4/lib && /opt/conda/envs/py_3.10/bin/cmake -E __run_co_compile --lwyu="ldd;-u;-r" --source=bin/static_runtime_bench && :
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_bwd(fmha_bwd_traits, fmha_bwd_args, ck_tile::stream_config const&)'
/usr/bin/ld: /var/lib/jenkins/pytorch/build/lib/libtorch_hip.so: undefined reference to `fmha_fwd(fmha_fwd_traits, fmha_fwd_args, ck_tile::stream_config const&)'
collect2: error: ld returned 1 exit status
[7798/7893] Linking CXX shared library lib/libbackend_with_compiler.so

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 10, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 10, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 11, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 16, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Apr 22, 2025

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit is in progress
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

Jenkins build for 507e0aa5b054253f13bc8cdfe6826baea425f464 commit is in progress
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants