-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Feature request: Graphical GGUF viewer #6715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have you seen: gguf-dump for printing metadata ? Or do you want something dynamic during the forward ? |
Yes I tries gguf-py but it does not have access to quantized types |
This could be quite fun. The web page can also generate a set of useful |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
@ngxson reopen? Also, I'd like to suggest similar functionality for imatrices. Or should I open a parallel FR? |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This is something I have been planning on working on so I took the liberty to assign this task to myself. I am putting together some designs and will post a link to them here soon. I am going to be requesting a bit of stakeholder info on this one after my initial designs to make sure the use cases are covered. |
Yes feel free to take this task. Things changed quite a lot since I created this issue, I feel like it's no longer serves my initial goal (to ease the process of adding new models), but would be nice to have something like @ggerganov suggested above! |
I came up with an initial set of high-level features regarding the gguf-viewer program (see below). However, I am in need of help generating ideas regarding the issue-reporting process. I am trying to figure out what should go in with a GGUF viewer, and whether a separate tool should be created with broader scope to capture diagnostic information/issue reporting. While my intent is not necessarily to discuss implementation details at the moment, I think a good solution for the tool is using Python and TKinter with a custom C extension to expose access to the GGML library. And this also goes for the potential diagnostic tool as Python would be great to: (1) spawn the server process; (2) use the OpenAI APIs; (2) tee the logs (if necessary); (3) load C extensions directly (to access GGUF/GGML libraries).
|
Yeah the idea seems good.
In fact, when I initially created this issue, the reason why I proposed to have this in cpp was because there is no implementation of quantization outside of cpp at that time. But things changed a lot since then, Going a bit further, I think it's also possible to do this completely on web environment (a bit like https://netron.app/ but having lot more gguf-specific functions). We could:
|
@ngxson Interesting projects—I will keep an eye on them! I notice in the default install, we are not bundling the gguf-py libraries. Is this something we should bundle with the install? Main reason I ask is because I think it's important to make these diagnostic tools something that can work out-of-the-box on a llama.cpp install for those who are not necessarily interested in pulling in several ML Python dependencies. If it is not something we want to include by default, then naturally a Python extension would make more sense as it can just wrap the ggml library directly. The "user journey" I imagine is: (1) I have an issue with my model—or I'm curious about a new model; (2) double-click gguf-viewer.py to open a GUI; (3) open a model file—(or select an HF URI to download); (4) explore the model; (5) generate a report and attach to github issue. I think a lot of people can benefit from this local-first approach, as it reduces the barrier to entry and makes the diagnostic tools more portable in that sense. Even popping open a browser and having a separate server process introduces cognitive load with firewall warnings, and so forth. That being said, I would like to understand user journeys from the online-first perspective as well—as in on the cloud. I am thinking it is fair to pull meta-data from the models (as we have now), but when it comes to "zooming in" to view the tensor blocks might pose some other issues. Perhaps we can satisfy both needs somehow. |
Closing because the work was already done in #12930 |
I am reopening this issue as I closed it prematurely—several of the proposed features are not added in #12930, so this should remain open to iterate on that as a baseline. |
Motivation
With the recent introduction of
eval-callback
example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is the ability to dump everything inside a gguf file into a human-readable (and interactive) interface.Inspired from
huggingface.js
where users can visualize the KV and list of tensors on huggingface.com, I would like to implement the same thing in llama.cpp. I find this helpful in these situations:convert.py
script when adding a new architectureeval-callback
)The reason why I can't use
huggingface.js
is because it's based on browser, which make it tricky when reading a huge local file. It also don't have access to quantized types (same forgguf-py
).Possible Implementation
Ideally, I want the implementation to be a binary named
gguf-viewer
that when run, will open a web page inlocalhost:8080
. User can then go to the web page to explore the gguf file. It will have these sections:The text was updated successfully, but these errors were encountered: