-
Notifications
You must be signed in to change notification settings - Fork 11.9k
DeepSeek V2/V3 with -mla
option
#12725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
jukofyork
wants to merge
24
commits into
ggml-org:master
from
jukofyork:mainline-llama-cpp-master--mla
Closed
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
b4c169f
Initial commit with all but the MLA graph code done
jukofyork 10207b4
Fixes
jukofyork ea3c05b
Just make `uint32_t n_embd_k` and `uint32_t n_embd_v`
jukofyork 1f604a7
First working version
jukofyork 1de077b
Fixed return bug in `DeepseekV2Model`
jukofyork 7f92e7b
Minor fixes
jukofyork 319e3ef
More fixes
jukofyork ee4b389
Renamed `wv_b` to `wv_decompress` to avoid confusion with `_b` biases
jukofyork c00cd9e
Better `_compressed` variable names
jukofyork 55ad3a7
Minor comment and variable name fixes
jukofyork 0c86f56
Moved `build_attn_mla` to better location
jukofyork b0c8a43
Removed `gguf.MODEL_TENSOR.ATTN_K_B` from `prepare_tensors()` for now
jukofyork 8c329bc
Bumped `wkv_b` and `wk_b` to use F32.
jukofyork 68302ee
Use `ggml_mul_mat_set_prec` `GGML_PREC_F32` by default for now
jukofyork 937a48d
Better/shorter variable names and more tidying up of code
jukofyork 1fd0aab
Fixed `kv_cmpr_pe` name
jukofyork 4fb439f
Added `n_embd_head_k` as constant
jukofyork f9a0ef4
Fixed to use `build_attn_mha()` now
jukofyork 5fe402a
`mla_attn` on then not `flash_attn` so we can run `-fa` for draft models
jukofyork 9b862f9
"flash_attn is not compatible with mla_attn" --> flash_attn off
jukofyork 8e23e0d
Fixed subtle bug caused by `-mla` for speculative models
jukofyork b384086
Removed need for `v_b_proj` storing. Tidied all ggml_row_size for quants
jukofyork 5dbf99c
Removed both calls to `ggml_mul_mat_set_prec` for MLA and non-MLA cases
jukofyork f0d514a
Merge branch 'ggml-org:master' into mainline-llama-cpp-master--mla
jukofyork File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are deleted inadvertently? For example, ffn_*_shexp are still used by qwen moe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these were all accidentally duplicated in the main branch so I removed the duplicates when inserting the new ones.