Skip to content

Use F16 for memory_k and memory_v (as suggested in #146) #154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ty-everett
Copy link
Contributor

As suggested in #146 we are able to save lots of memory by using float16 instead of float32. I implemented the suggested changes, and tested with the 7B and 13B models, and there were no issues on my Intel-based MacBook Pro.

Merging these changes should allow more models to run more performantly on a wider range of hardware.

@Green-Sky
Copy link
Collaborator

can confirm ggml ctx size 4529.34 MB -> 4273.34 MB
speed stayed the same.

it is hard to tell if the quality changes, but the prediction does (obviously).

@ggerganov
Copy link
Member

I was worried that it might degrade quality, but I have no evals as you can guess.
I think it is best to gate this through a command line argument. Have it F32 by default, and if requested by the user - set it to F16

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@Green-Sky
Copy link
Collaborator

I ran some more, non scientific tests:

7B:
image

30B:
image

both where ran with -t 4 -n 2048 --repeat_penalty 1.176 --repeat_last_n 256 --temp 0.8 --top_p 0.1 -c 2048 --color -i -r "User:" -f prompts/i_example1.txt

@Green-Sky
Copy link
Collaborator

@ty-everett are you going to write the cli-param conditional version? if not, I will do it.

ggerganov pushed a commit that referenced this pull request Mar 19, 2023
…154) (#294)

* Use F16 for memory_k and memory_v

* add command line switch to use f16 instead of f32 for memory k+v

---------

Co-authored-by: Ty Everett <[email protected]>
@ggerganov ggerganov closed this Mar 19, 2023
Deadsg pushed a commit to Deadsg/llama.cpp that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants