-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Feature Request: Ability to pack multiple GGUFs into single one #13028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An alternative approach to the one proposed is like this:
Internally, the implementation will have respective namespace prefixes for each There could be an API for querying the available namespaces in a GGUF file, but it seems optional for now and we can add it later on. |
So that's why MS's bitnet-25 doesn't work yet? https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/tree/main |
No, that one doesn't work yet because they used a custom quant type (called The architecture of that model can be added relatively easily (using the changes from microsoft/BitNet@4f2e41a as suggested in https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/discussions/2, and also adapting the conversion script to handle their packed format) and then using It's not related to having multiple models in a single GGUF. |
Feature Description
From an idea brought up by @ggerganov in this discussion: #11139 (reply in thread)
While it is NOT a good idea to pack both mmproj + text models (because vision support is still messy atm), we still have some interesting use cases:
Motivation
I create this issue to discuss about possible implementation
Possible Implementation
An implementation could be to have "namespace" for KV metadata and tensor name, then have a "super" key for the list of namespaces
For example, with the case of Sesame CSM, given 2 GGUFs: backbone and decoder, the routine to pack these 2 GGUFs is as follow:
general.namespaces = ["backbone", "decoder"]
backbone.
prefix to the key namedecoder.
prefix to the key nameThese APIs will need to be added into
libllama
:int32_t llama_model_n_namespaces(llama_model * model)
: returns the number of namespaces, 0 meaning no namespaceconst char ** llama_model_list_namespaces(llama_model * model)
: returns the list of namespace as stringsllama_model * llama_model_get_namespace(int idx)
: returns the subllama_model *
object corresponding to a namespace indexProblems
The text was updated successfully, but these errors were encountered: