-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Bug] llama_set_state_data() does not work correctly with offloaded GPU Layers (kv cache) #2422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
AFAIK, if the kv cache is in GPU, it's not even synced back at all. Might be worth doing it just during those calls. |
Seems like #2239 was closed by now. I should maybe try to check if the bug still exists at this point. |
I retried it, this bug is still existent. Thankfully someone made the "-ngl" option available by default by now for the save-load-state example. I also observed that offloading only the v state to the GPU will make llama.cpp generally spew out none sense (which makes sense to me, but for a user this might be unexpected). |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Modified save-load-state example
I applied the following little patch to save-load-state:
Execution with GPU offloading 32 Layers of 35 total
And then I compiled and executed it without layers:
I get the expected output:
Execution with 33 of 35 offloaded GPU layers
The problem when I pass
-ngl 33
I get different results between saving and loading the state.The results get more wild the more I offload to the GPU.
Maybe the offloaded v cache might has something to do with this?
This is what I get when offloading all 35:
Environment and Context
lscpu
$ uname -a
Linux mockwork 5.19.0-46-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 21 15:35:31 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce
-ngl 20
-ngl 35
Please note: The bug did already exist before that commit.
The text was updated successfully, but these errors were encountered: