Skip to content

Feature request: Graphical GGUF viewer #6715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ngxson opened this issue Apr 17, 2024 · 15 comments
Open

Feature request: Graphical GGUF viewer #6715

ngxson opened this issue Apr 17, 2024 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@ngxson
Copy link
Collaborator

ngxson commented Apr 17, 2024

Motivation

With the recent introduction of eval-callback example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is the ability to dump everything inside a gguf file into a human-readable (and interactive) interface.

Inspired from huggingface.js where users can visualize the KV and list of tensors on huggingface.com, I would like to implement the same thing in llama.cpp. I find this helpful in these situations:

  • Debugging convert.py script when adding a new architecture
  • Debugging tokenizers
  • Debugging changes related to gguf (model splits for example)
  • Debugging tensors (i.e. display N first elements of a tensor, just like eval-callback)
  • Debugging control vectors
  • ... (maybe other usages in the future)

The reason why I can't use huggingface.js is because it's based on browser, which make it tricky when reading a huge local file. It also don't have access to quantized types (same for gguf-py).

Possible Implementation

Ideally, I want the implementation to be a binary named gguf-viewer that when run, will open a web page in localhost:8080. User can then go to the web page to explore the gguf file. It will have these sections:

  • Complete list of KV
  • Tokenizer-related info (for example: list all tokens, lookup one token)
  • List of all tensors
@ngxson ngxson added the enhancement New feature or request label Apr 17, 2024
@phymbert
Copy link
Collaborator

phymbert commented Apr 17, 2024

Have you seen:

gguf-dump for printing metadata ?

Or do you want something dynamic during the forward ?

@ngxson
Copy link
Collaborator Author

ngxson commented Apr 17, 2024

Yes I tries gguf-py but it does not have access to quantized types

@ggerganov
Copy link
Member

This could be quite fun. The web page can also generate a set of useful llama.cpp commands for that specific model (e.g. run main, server, etc) that can be copy-pasted for convenience.

Copy link
Contributor

github-actions bot commented Jun 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jun 3, 2024
@ngxson ngxson reopened this Jun 3, 2024
@github-actions github-actions bot removed the stale label Jun 4, 2024
@github-actions github-actions bot added the stale label Jul 4, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@oldgithubman
Copy link

@ngxson reopen? Also, I'd like to suggest similar functionality for imatrices. Or should I open a parallel FR?

@ngxson ngxson reopened this Jul 21, 2024
@ngxson ngxson removed the stale label Jul 21, 2024
@github-actions github-actions bot added the stale label Aug 21, 2024
Copy link
Contributor

github-actions bot commented Sep 4, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@bandoti bandoti self-assigned this Mar 10, 2025
@bandoti
Copy link
Collaborator

bandoti commented Mar 10, 2025

This is something I have been planning on working on so I took the liberty to assign this task to myself.

I am putting together some designs and will post a link to them here soon. I am going to be requesting a bit of stakeholder info on this one after my initial designs to make sure the use cases are covered.

@bandoti bandoti reopened this Mar 10, 2025
@ngxson
Copy link
Collaborator Author

ngxson commented Mar 10, 2025

Yes feel free to take this task. Things changed quite a lot since I created this issue, I feel like it's no longer serves my initial goal (to ease the process of adding new models), but would be nice to have something like @ggerganov suggested above!

@github-actions github-actions bot removed the stale label Mar 11, 2025
@bandoti
Copy link
Collaborator

bandoti commented Mar 31, 2025

I came up with an initial set of high-level features regarding the gguf-viewer program (see below). However, I am in need of help generating ideas regarding the issue-reporting process. I am trying to figure out what should go in with a GGUF viewer, and whether a separate tool should be created with broader scope to capture diagnostic information/issue reporting.

While my intent is not necessarily to discuss implementation details at the moment, I think a good solution for the tool is using Python and TKinter with a custom C extension to expose access to the GGML library. And this also goes for the potential diagnostic tool as Python would be great to: (1) spawn the server process; (2) use the OpenAI APIs; (2) tee the logs (if necessary); (3) load C extensions directly (to access GGUF/GGML libraries).

  1. I would like to interactively explore GGUF files.
  2. Meta-Data, tensor info, and the actual tensors should be explorable.
  3. More than one GGUF file should be able to be opened at once.
  4. Tensors should be visible as both a summary and as visual blocks representing the binary data.
  5. Tensors should be shown as a high level overview and may be “zoomed into” for more details.
  6. The GGUF viewer should be minimal on dependencies and be simply deployable with the llama.cpp suite of programs. It should have access to the GGML/GGUF C APIs.
  7. Complete list of tokens should be explorable, and should be visible as both strings and numeric values.
  8. Use some sort of heatmap to relate tensor types to visual blocks—colour-coded by category.
  9. Generate formatted report of loaded model (HTML/Markdown/XML/JSON).

@ngxson
Copy link
Collaborator Author

ngxson commented Mar 31, 2025

Yeah the idea seems good.

Python and TKinter with a custom C extension to expose access to the GGML library

In fact, when I initially created this issue, the reason why I proposed to have this in cpp was because there is no implementation of quantization outside of cpp at that time.

But things changed a lot since then, gguf-py has quants.py which allow quantize-dequantize using numpy

Going a bit further, I think it's also possible to do this completely on web environment (a bit like https://netron.app/ but having lot more gguf-specific functions). We could:

  • Build on top of huggingface/gguf package which allow access to KV metadata
  • Use custom dequantization functions (can be either reimplement from python code, or I can expose these methods via my wasm binding)
  • Base on FileReader API to read the file chunk by chunk, allow loading even big GGUF

@bandoti
Copy link
Collaborator

bandoti commented Apr 1, 2025

@ngxson Interesting projects—I will keep an eye on them!

I notice in the default install, we are not bundling the gguf-py libraries. Is this something we should bundle with the install? Main reason I ask is because I think it's important to make these diagnostic tools something that can work out-of-the-box on a llama.cpp install for those who are not necessarily interested in pulling in several ML Python dependencies. If it is not something we want to include by default, then naturally a Python extension would make more sense as it can just wrap the ggml library directly.

The "user journey" I imagine is: (1) I have an issue with my model—or I'm curious about a new model; (2) double-click gguf-viewer.py to open a GUI; (3) open a model file—(or select an HF URI to download); (4) explore the model; (5) generate a report and attach to github issue.

I think a lot of people can benefit from this local-first approach, as it reduces the barrier to entry and makes the diagnostic tools more portable in that sense. Even popping open a browser and having a separate server process introduces cognitive load with firewall warnings, and so forth.

That being said, I would like to understand user journeys from the online-first perspective as well—as in on the cloud. I am thinking it is fair to pull meta-data from the models (as we have now), but when it comes to "zooming in" to view the tensor blocks might pose some other issues. Perhaps we can satisfy both needs somehow.

@bandoti bandoti closed this as completed Apr 19, 2025
@bandoti
Copy link
Collaborator

bandoti commented Apr 19, 2025

Closing because the work was already done in #12930

@bandoti bandoti closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2025
@bandoti
Copy link
Collaborator

bandoti commented Apr 21, 2025

I am reopening this issue as I closed it prematurely—several of the proposed features are not added in #12930, so this should remain open to iterate on that as a baseline.

@bandoti bandoti reopened this Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants