-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f #7730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
i found that vram dont have significant increase while inferring. so maybe something else caused the issue |
bisected and found the commit caused the issus, so keep it open |
@0cc4m this is probably my bad, I made some changes to the way views are initialized in ggml-backend that may have created this issue. Views are now initialized in the buffer of their parent tensor, instead of on the compute buffer. The reason I made this change is because I came to the conclusion that allocating views on the compute buffer cannot work reliably because the compute buffer is not always of the same type as the buffer used to allocate the tensor originally, and backends should be able to use the same extra as their parent anyway. I thought it was safe to make this change because the CUDA backend no longer needs extras for normal buffers, but I didn't realize that the vulkan backend still does. Looking at the |
What happened?
after bde7cd3 . inferring any llama3 q6 model will cause a gpu hang. previous version (a5735e4) is not affected
Name and Version
bde7cd3
using vulkan backend
What operating system are you seeing the problem on?
Linux
Relevant log output
dmesg
The text was updated successfully, but these errors were encountered: