Skip to content

cuda : add half2 __shfl_xor() for ROCm 5.5 #7263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 18, 2024

Conversation

Engininja2
Copy link
Contributor

__shfl_xor() for half2 was added in ROCm 5.6. This PR implements it for HIP versions less than that.
Fixes #7242

@mofosyne mofosyne added Nvidia GPU Issues specific to Nvidia GPUs Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 14, 2024
@JohannesGaessler
Copy link
Collaborator

Based on just static code analysis I approve. Unfortunately I do not have a system set up to test this PR with. Ideally you would get the person that initially reported the issue to confirm that the fix works. @Engininja2 I would still be able to merge this PR without an actual confirmation if I can get a pledge from you that you will take care of any potential follow-up issues that could arise from this PR (just in case, I don't think there will be any).

@Engininja2
Copy link
Contributor Author

I tested it with the 5.5 Windows HIP SDK and main compiles & runs okay.

@JohannesGaessler JohannesGaessler merged commit d233b50 into ggml-org:master May 18, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nvidia GPU Issues specific to Nvidia GPUs Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compilation error using HIP SDK on Windows
3 participants