-
Notifications
You must be signed in to change notification settings - Fork 66
[release/2.5][ROCm][TunableOp] Improve identification of fastest solution (#144942) #2018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/2.5][ROCm][TunableOp] Improve identification of fastest solution (#144942) #2018
Conversation
…#144942) This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300. Changes include: - An improved timer, StreamTimerNoSync - More aggressive skipping of slow solutions - Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3 Pull Request resolved: pytorch#144942 Approved by: https://github.com/jeffdaily (cherry picked from commit fd0cd6a)
This is a performance improvement from upstream. So far, there have been no negative reports w.r.t. to performance. So, I think it's worth backporting. I will also add it to ROCm release/2.6. It cannot be trivially backported to release/2.4. |
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit finished as FAILURE |
Jenkins build for acd66a22a6f79aa784015121cc22fa653ac1e9bb commit is in progress |
!cherry-pick --onto release/2.6 |
…tion (pytorch#144942) (#2018) This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300. Changes include: - An improved timer, StreamTimerNoSync - More aggressive skipping of slow solutions - Additional statistics that can be used for diagnostics PYTORCH_TUNABLEOP_VERBOSE=3 Pull Request resolved: pytorch#144942 Approved by: https://github.com/jeffdaily (cherry picked from commit fd0cd6a)
Created branch autogenerated/release/2.6_cherry-pick_pr-2018 and #2041 |
This PR addresses some stability issues with identifying the fastest solution on AMD GPUs, particularly the MI300.
Changes include:
Pull Request resolved: pytorch#144942
Approved by: https://github.com/jeffdaily
(cherry picked from commit fd0cd6a)