Feature request: Graphical GGUF viewer #6715

ngxson · 2024-04-17T04:30:46Z

Motivation

With the recent introduction of eval-callback example, we now having more tools for debugging when working with llama.cpp. However, one of the tool that I feel missing is the ability to dump everything inside a gguf file into a human-readable (and interactive) interface.

Inspired from huggingface.js where users can visualize the KV and list of tensors on huggingface.com, I would like to implement the same thing in llama.cpp. I find this helpful in these situations:

Debugging convert.py script when adding a new architecture
Debugging tokenizers
Debugging changes related to gguf (model splits for example)
Debugging tensors (i.e. display N first elements of a tensor, just like eval-callback)
Debugging control vectors
... (maybe other usages in the future)

The reason why I can't use huggingface.js is because it's based on browser, which make it tricky when reading a huge local file. It also don't have access to quantized types (same for gguf-py).

Possible Implementation

Ideally, I want the implementation to be a binary named gguf-viewer that when run, will open a web page in localhost:8080. User can then go to the web page to explore the gguf file. It will have these sections:

Complete list of KV
Tokenizer-related info (for example: list all tokens, lookup one token)
List of all tensors

The text was updated successfully, but these errors were encountered:

phymbert · 2024-04-17T06:09:11Z

Have you seen:

gguf-dump for printing metadata ?

Or do you want something dynamic during the forward ?

ngxson · 2024-04-17T06:19:42Z

Yes I tries gguf-py but it does not have access to quantized types

ggerganov · 2024-04-17T06:35:03Z

This could be quite fun. The web page can also generate a set of useful llama.cpp commands for that specific model (e.g. run main, server, etc) that can be copy-pasted for convenience.

github-actions · 2024-06-03T01:06:34Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions · 2024-07-18T01:06:50Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

oldgithubman · 2024-07-18T02:42:34Z

@ngxson reopen? Also, I'd like to suggest similar functionality for imatrices. Or should I open a parallel FR?

github-actions · 2024-09-04T01:07:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions · 2024-10-25T01:28:18Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

bandoti · 2025-03-10T21:37:52Z

This is something I have been planning on working on so I took the liberty to assign this task to myself.

I am putting together some designs and will post a link to them here soon. I am going to be requesting a bit of stakeholder info on this one after my initial designs to make sure the use cases are covered.

ngxson · 2025-03-10T21:56:52Z

Yes feel free to take this task. Things changed quite a lot since I created this issue, I feel like it's no longer serves my initial goal (to ease the process of adding new models), but would be nice to have something like @ggerganov suggested above!

bandoti · 2025-03-31T20:12:36Z

I came up with an initial set of high-level features regarding the gguf-viewer program (see below). However, I am in need of help generating ideas regarding the issue-reporting process. I am trying to figure out what should go in with a GGUF viewer, and whether a separate tool should be created with broader scope to capture diagnostic information/issue reporting.

While my intent is not necessarily to discuss implementation details at the moment, I think a good solution for the tool is using Python and TKinter with a custom C extension to expose access to the GGML library. And this also goes for the potential diagnostic tool as Python would be great to: (1) spawn the server process; (2) use the OpenAI APIs; (2) tee the logs (if necessary); (3) load C extensions directly (to access GGUF/GGML libraries).

I would like to interactively explore GGUF files.
Meta-Data, tensor info, and the actual tensors should be explorable.
More than one GGUF file should be able to be opened at once.
Tensors should be visible as both a summary and as visual blocks representing the binary data.
Tensors should be shown as a high level overview and may be “zoomed into” for more details.
The GGUF viewer should be minimal on dependencies and be simply deployable with the llama.cpp suite of programs. It should have access to the GGML/GGUF C APIs.
Complete list of tokens should be explorable, and should be visible as both strings and numeric values.
Use some sort of heatmap to relate tensor types to visual blocks—colour-coded by category.
Generate formatted report of loaded model (HTML/Markdown/XML/JSON).

ngxson · 2025-03-31T21:06:43Z

Yeah the idea seems good.

Python and TKinter with a custom C extension to expose access to the GGML library

In fact, when I initially created this issue, the reason why I proposed to have this in cpp was because there is no implementation of quantization outside of cpp at that time.

But things changed a lot since then, gguf-py has quants.py which allow quantize-dequantize using numpy

Going a bit further, I think it's also possible to do this completely on web environment (a bit like https://netron.app/ but having lot more gguf-specific functions). We could:

Build on top of huggingface/gguf package which allow access to KV metadata
Use custom dequantization functions (can be either reimplement from python code, or I can expose these methods via my wasm binding)
Base on FileReader API to read the file chunk by chunk, allow loading even big GGUF

bandoti · 2025-04-01T17:27:38Z

@ngxson Interesting projects—I will keep an eye on them!

I notice in the default install, we are not bundling the gguf-py libraries. Is this something we should bundle with the install? Main reason I ask is because I think it's important to make these diagnostic tools something that can work out-of-the-box on a llama.cpp install for those who are not necessarily interested in pulling in several ML Python dependencies. If it is not something we want to include by default, then naturally a Python extension would make more sense as it can just wrap the ggml library directly.

The "user journey" I imagine is: (1) I have an issue with my model—or I'm curious about a new model; (2) double-click gguf-viewer.py to open a GUI; (3) open a model file—(or select an HF URI to download); (4) explore the model; (5) generate a report and attach to github issue.

I think a lot of people can benefit from this local-first approach, as it reduces the barrier to entry and makes the diagnostic tools more portable in that sense. Even popping open a browser and having a separate server process introduces cognitive load with firewall warnings, and so forth.

That being said, I would like to understand user journeys from the online-first perspective as well—as in on the cloud. I am thinking it is fair to pull meta-data from the models (as we have now), but when it comes to "zooming in" to view the tensor blocks might pose some other issues. Perhaps we can satisfy both needs somehow.

bandoti · 2025-04-19T12:45:39Z

Closing because the work was already done in #12930

bandoti · 2025-04-21T13:05:00Z

I am reopening this issue as I closed it prematurely—several of the proposed features are not added in #12930, so this should remain open to iterate on that as a baseline.

ngxson added the enhancement New feature or request label Apr 17, 2024

ggerganov mentioned this issue May 7, 2024

tokenization: no double BOS tokens #7107

Open

github-actions bot added the stale label May 18, 2024

github-actions bot closed this as completed Jun 3, 2024

ngxson reopened this Jun 3, 2024

github-actions bot removed the stale label Jun 4, 2024

github-actions bot added the stale label Jul 4, 2024

github-actions bot closed this as completed Jul 18, 2024

ngxson reopened this Jul 21, 2024

ngxson removed the stale label Jul 21, 2024

github-actions bot added the stale label Aug 21, 2024

github-actions bot closed this as completed Sep 4, 2024

mscheong01 reopened this Sep 9, 2024

mscheong01 removed the stale label Sep 9, 2024

compilade mentioned this issue Sep 10, 2024

imatrix : use GGUF to store importance matrices #9400

Draft

8 tasks

github-actions bot added the stale label Oct 10, 2024

github-actions bot closed this as completed Oct 25, 2024

bandoti self-assigned this Mar 10, 2025

bandoti reopened this Mar 10, 2025

github-actions bot removed the stale label Mar 11, 2025

bandoti closed this as completed Apr 19, 2025

bandoti closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2025

bandoti mentioned this issue Apr 19, 2025

gguf-py : GGUF Editor GUI - Python + Qt #12930

Merged

bandoti reopened this Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Graphical GGUF viewer #6715

Feature request: Graphical GGUF viewer #6715

ngxson commented Apr 17, 2024 •

edited

Loading

phymbert commented Apr 17, 2024 •

edited

Loading

ngxson commented Apr 17, 2024

ggerganov commented Apr 17, 2024

github-actions bot commented Jun 3, 2024

github-actions bot commented Jul 18, 2024

oldgithubman commented Jul 18, 2024

github-actions bot commented Sep 4, 2024

github-actions bot commented Oct 25, 2024

bandoti commented Mar 10, 2025 •

edited

Loading

ngxson commented Mar 10, 2025

bandoti commented Mar 31, 2025 •

edited

Loading

ngxson commented Mar 31, 2025 •

edited

Loading

bandoti commented Apr 1, 2025

bandoti commented Apr 19, 2025

bandoti commented Apr 21, 2025

Feature request: Graphical GGUF viewer #6715

Feature request: Graphical GGUF viewer #6715

Comments

ngxson commented Apr 17, 2024 • edited Loading

Motivation

Possible Implementation

phymbert commented Apr 17, 2024 • edited Loading

ngxson commented Apr 17, 2024

ggerganov commented Apr 17, 2024

github-actions bot commented Jun 3, 2024

github-actions bot commented Jul 18, 2024

oldgithubman commented Jul 18, 2024

github-actions bot commented Sep 4, 2024

github-actions bot commented Oct 25, 2024

bandoti commented Mar 10, 2025 • edited Loading

ngxson commented Mar 10, 2025

bandoti commented Mar 31, 2025 • edited Loading

ngxson commented Mar 31, 2025 • edited Loading

bandoti commented Apr 1, 2025

bandoti commented Apr 19, 2025

bandoti commented Apr 21, 2025

ngxson commented Apr 17, 2024 •

edited

Loading

phymbert commented Apr 17, 2024 •

edited

Loading

bandoti commented Mar 10, 2025 •

edited

Loading

bandoti commented Mar 31, 2025 •

edited

Loading

ngxson commented Mar 31, 2025 •

edited

Loading