main : add new feature: special commands #10145

ngxson · 2024-11-03T00:18:17Z

Motivation

This is kinda a fun hack. I don't even know if it's useful for anyone.

The idea is to add slack-like (or discord-like) command support to do various things without having the re-launch the app:

special commands in conversation mode:
  /readfile FILE   read prompt from file
  /savesess FILE   save session to file
  /loadsess FILE   load session from file
  /regen           regenerate the last response
  /dump FILE       dump chat content to a file

Real-life use case

For example, I'm studying about Hồ Xuân Hương, a vietnamese poet. The info can be found in this Wikipedia article.

I downloaded the article above to a file named wiki.txt

> hi
Hello! How can I assist you today?

> /savesess hi.bin
save session file: 'hi.bin'

> /readfile wiki.txt
read 3841 characters from file

> summarize in one sentence
Hồ Xuân Hương (1772-1822) was a renowned Vietnamese poet, known as the "Queen of Nôm poetry," who wrote classical poems in chữ Nôm, a Vietnamese script, and became a celebrated figure in Vietnamese literature for her witty, irreverent, and independent-minded works.

Then I can ask many questions, until I feel like I want to go back to checkpoint hi.bin that I saved earlier

> /loadsess hi.bin
load session file: 'hi.bin'
loaded 30 tokens from session file 'hi.bin'

> repeat what I said
You said "hi". How can I help you?

Then I can continue to ask more questions.

I have read the contributing guidelines
Self-reported review complexity:
- Low

MaggotHATE · 2024-11-03T04:38:45Z

Hi! Thank you for this PR, should be very useful (I did something similar in my own main-based program).

The obvious ideas for this are restart (or reload) and regenerate (for the last output), although the latter might be complicated for main currently. And I remember that being able to save messages to a text file was requested at some point.

programmbauer · 2024-11-03T12:38:47Z

llamafile added similar commands in a recent commit (the /undo, /push and /pop commands are pretty similar to the /regenerate command proposed by MaggotHATE).

llamafile 0.8.15

ngxson · 2024-11-03T21:19:09Z

I played around with this idea today (without knowing that llamafile already had this kind of feature)

Added 2 more commands:

/regen for regenerating the last response
/dump FILE to save a transcription into a text file

Still, not sure if my implementation is ok. I still feel like it's a bit hacky to add std::string<int> pos_history to keep track of n_past every time a new message is added.

slaren · 2024-11-03T21:45:04Z

I think it would be good to have a separate example just for chat, focused on usability, and without all the complexity of main. Maybe repurpose main for completions only. It would make the code simpler overall and easier to work with, and then it would make sense to extend it with all kinds of features to make it a better chat program.

On a side note, I also think that it would be very desirable to move away from the "ozempic" (that's the way it was described originally) default llama-server page, and instead bundle a full featured, easy to use, chat interface with the server. I don't see any reason to force users to install 3rd party applications to have a reasonable UX with llama.cpp.

MaggotHATE · 2024-11-04T07:01:23Z

examples/main/main.cpp

+                        n_remain += n_tokens_removed;
+                        is_interacting = false;
+                        // we intentionally do not reset the sampling, so new message will be more diverse
+                        continue;


I remember having problems with regeneration without adjusting prev. I ended up capturing the entire ring buffer and just restoring it. I did the same to all other values, capturing state instead of calculating. So far I have stable results over 20 regenerations (my usual bulk testing) with the following:

restore_smpl(); // rewinds gsmpl->prev llama_kv_cache_seq_rm(ctx, 0, rewind_state.kv_cache_pos, -1); embd_inp.erase(embd_inp.begin() + rewind_state.embd_inp_size, embd_inp.end()); // not sure n_past = rewind_state.n_past_size; n_consumed = rewind_state.n_consumed_size;

However, I have to say that there is something missing/extra here: when testing K-Shift sampler I noticed that initial logits of the first message are different from all later regenerations (those are the same, though). Still not sure what's wrong, but maybe it helps.

MaggotHATE · 2024-11-04T07:02:58Z

examples/main/main.cpp

+                        int last_n_past = pos_history.back();
+                        int n_tokens_removed = n_past - last_n_past;
+                        llama_kv_cache_seq_rm(ctx, 0, last_n_past, -1);
+                        n_remain += n_tokens_removed;


This might become a problem with n_predict == -1 (infinite) or -2 (stop at context size), comment from here.

ngxson · 2024-11-04T10:01:38Z

@slaren Completely agree with that. A better user experience chat CLI is very desirable. Personally, I'd prefer re-propose the llama-cli for this usage since most users are already familiar with it. More "advanced" (and lesser-known) things like --interactive, --in-prefix, --in-suffix can be moved to a new binary called llama-completion for example.

The server UI is (to be honest) a low hanging fruit. However, I've been always hesitate to get my hand on this part of the project. Even though I've been working on web development more than half of my life, web development to me is always kinda "101 ways to shoot yourself in the foot." - there are just too many choices to do a simple thing on the web. IMO the best way to kick-start this is to throw away all the frontend code and establish a "standard" for that part. Still, I'd love to hear what's your thought on that @ggerganov

ggerganov · 2024-11-04T10:44:55Z

IMO the best way to kick-start this is to throw away all the frontend code and establish a "standard" for that part. Still, I'd love to hear what's your thought on that @ggerganov

How would throwing it away help in this case? We can have multiple UI implementations since they are well decoupled from the server code.

I agree, it can be a fun and useful project, so if there is will from people to implement a full-fledged chat UI, it would be great.

p.s. I personally would love to write a Dear ImGui app at some point, as I really enjoy making UI's with this library :)

ngxson · 2024-11-04T11:27:57Z

How would throwing it away help in this case? We can have multiple UI implementations since they are well decoupled from the server code.

What I mean is that atm we have multiple implementations of the web UI:

index.html, the original UI made using preact. It uses the /completions endpoint
index-new.html, the "themed" version of index.html. But instead of being a "layer on top", it's actually duplicated version of index.html
simplechat, a different version that use /chat/completions and does not use preact (instead, the dev opt-in writing a handmade framework ui.mjs for managing the DOM elements)

So to me, it's kinda a "soupe" to keep all of these. As I said earlier web dev nowadays is kinda "101 ways to shoot yourself in the foot" and sorry to say that the current state of server.cpp web UI reflects exactly that. (I'm limiting my scope to the web UI part only, completely agree that this has nothing to do with server.cpp itself)

In addition, I really think we could archive the same web UI we're having nowadays with much less code. Javascript is a high-level language anyway, so through time there has been (and will be) frameworks allowing us to "write less, do more".

I agree, it can be a fun and useful project, so if there is will from people to implement a full-fledged chat UI, it would be great.

Hah yeah I'm not opposing this idea, but just to give some insights. For desktop apps, the complexity is mostly in the packaging stage and not the development stage. My experience working at the past job is that packaging for windows/mac is easier than linux (most people use Gnome or KDE).

Indeed, I did end up choosing Flutter as desktop UI framework thanks to its easy packaging. Some may say that Electron-based solutions is much easier, but that's only true if the app never touches low-level c++ bind. A built Flutter app is also much lighter than Electron too.

That's just some of my experiences, but I'm also curious to see what you can do with ImGui too!

(Sorry for hijacking this PR for discussing server.cpp 😂 we may need to create a dedicated issue)

MaggotHATE · 2024-11-04T11:56:18Z

p.s. I personally would love to write a Dear ImGui app at some point, as I really enjoy making UI's with this library :)

AFAIK it still doesn't have text wrap in edit fields, so some internal work would be needed. I still can't make myself do that.

I remember there were C++ libraries for HTML-based UI, this might allow using the same UI for server and desktop apps (since html is already in llama.cpp).

ngxson · 2024-11-04T13:31:26Z

I remember there were C++ libraries for HTML-based UI, this might allow using the same UI for server and desktop apps (since html is already in llama.cpp).

I'm aware of this approach. The problem with these c++ web UI library is that their only propose is to generate HTML based on c++ calls. So, in the end they don't provide real benefits, but just a layer on top HTML.

What I'm trying to say is that there are much lightweight approaches. Take VueJS for example, it is easy to learn, has many resources & large community (to be fair, it's a bit heavier than Preact, but easier to work with)

arch-btw · 2024-11-07T00:20:58Z

/dump FILE dump chat content to a file

I've been hoping for this feature for so long!

main : add special commands

d7a4f3e

github-actions bot added the examples label Nov 3, 2024

add some other commands

1716e6b

MaggotHATE reviewed Nov 4, 2024

View reviewed changes

ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Nov 4, 2024

ngxson mentioned this pull request Jan 12, 2025

Feature Request: Better chat UX for llama-cli #11202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main : add new feature: special commands #10145

main : add new feature: special commands #10145

ngxson commented Nov 3, 2024 •

edited

Loading

MaggotHATE commented Nov 3, 2024

programmbauer commented Nov 3, 2024 •

edited

Loading

ngxson commented Nov 3, 2024 •

edited

Loading

slaren commented Nov 3, 2024

MaggotHATE Nov 4, 2024 •

edited

Loading

MaggotHATE Nov 4, 2024

ngxson commented Nov 4, 2024 •

edited

Loading

ggerganov commented Nov 4, 2024

ngxson commented Nov 4, 2024 •

edited

Loading

MaggotHATE commented Nov 4, 2024

ngxson commented Nov 4, 2024 •

edited

Loading

arch-btw commented Nov 7, 2024

main : add new feature: special commands #10145

Are you sure you want to change the base?

main : add new feature: special commands #10145

Conversation

ngxson commented Nov 3, 2024 • edited Loading

Motivation

Real-life use case

MaggotHATE commented Nov 3, 2024

programmbauer commented Nov 3, 2024 • edited Loading

ngxson commented Nov 3, 2024 • edited Loading

slaren commented Nov 3, 2024

MaggotHATE Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

MaggotHATE Nov 4, 2024

Choose a reason for hiding this comment

ngxson commented Nov 4, 2024 • edited Loading

ggerganov commented Nov 4, 2024

ngxson commented Nov 4, 2024 • edited Loading

MaggotHATE commented Nov 4, 2024

ngxson commented Nov 4, 2024 • edited Loading

arch-btw commented Nov 7, 2024

ngxson commented Nov 3, 2024 •

edited

Loading

programmbauer commented Nov 3, 2024 •

edited

Loading

ngxson commented Nov 3, 2024 •

edited

Loading

MaggotHATE Nov 4, 2024 •

edited

Loading

ngxson commented Nov 4, 2024 •

edited

Loading

ngxson commented Nov 4, 2024 •

edited

Loading

ngxson commented Nov 4, 2024 •

edited

Loading