Skip to content

main : add new feature: special commands #10145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 3, 2024

Motivation

This is kinda a fun hack. I don't even know if it's useful for anyone.

The idea is to add slack-like (or discord-like) command support to do various things without having the re-launch the app:

special commands in conversation mode:
  /readfile FILE   read prompt from file
  /savesess FILE   save session to file
  /loadsess FILE   load session from file
  /regen           regenerate the last response
  /dump FILE       dump chat content to a file

Real-life use case

For example, I'm studying about Hồ Xuân Hương, a vietnamese poet. The info can be found in this Wikipedia article.

I downloaded the article above to a file named wiki.txt

> hi
Hello! How can I assist you today?

> /savesess hi.bin
save session file: 'hi.bin'

> /readfile wiki.txt
read 3841 characters from file

> summarize in one sentence
Hồ Xuân Hương (1772-1822) was a renowned Vietnamese poet, known as the "Queen of Nôm poetry," who wrote classical poems in chữ Nôm, a Vietnamese script, and became a celebrated figure in Vietnamese literature for her witty, irreverent, and independent-minded works.

Then I can ask many questions, until I feel like I want to go back to checkpoint hi.bin that I saved earlier

> /loadsess hi.bin
load session file: 'hi.bin'
loaded 30 tokens from session file 'hi.bin'

> repeat what I said
You said "hi". How can I help you?

Then I can continue to ask more questions.


@MaggotHATE
Copy link
Contributor

Hi! Thank you for this PR, should be very useful (I did something similar in my own main-based program).

The obvious ideas for this are restart (or reload) and regenerate (for the last output), although the latter might be complicated for main currently. And I remember that being able to save messages to a text file was requested at some point.

@programmbauer
Copy link

programmbauer commented Nov 3, 2024

llamafile added similar commands in a recent commit (the /undo, /push and /pop commands are pretty similar to the /regenerate command proposed by MaggotHATE).

llamafile 0.8.15

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 3, 2024

I played around with this idea today (without knowing that llamafile already had this kind of feature)

Added 2 more commands:

  • /regen for regenerating the last response
  • /dump FILE to save a transcription into a text file

Still, not sure if my implementation is ok. I still feel like it's a bit hacky to add std::string<int> pos_history to keep track of n_past every time a new message is added.

@slaren
Copy link
Member

slaren commented Nov 3, 2024

I think it would be good to have a separate example just for chat, focused on usability, and without all the complexity of main. Maybe repurpose main for completions only. It would make the code simpler overall and easier to work with, and then it would make sense to extend it with all kinds of features to make it a better chat program.

On a side note, I also think that it would be very desirable to move away from the "ozempic" (that's the way it was described originally) default llama-server page, and instead bundle a full featured, easy to use, chat interface with the server. I don't see any reason to force users to install 3rd party applications to have a reasonable UX with llama.cpp.

n_remain += n_tokens_removed;
is_interacting = false;
// we intentionally do not reset the sampling, so new message will be more diverse
continue;
Copy link
Contributor

@MaggotHATE MaggotHATE Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember having problems with regeneration without adjusting prev. I ended up capturing the entire ring buffer and just restoring it. I did the same to all other values, capturing state instead of calculating. So far I have stable results over 20 regenerations (my usual bulk testing) with the following:

        restore_smpl(); // rewinds gsmpl->prev
        llama_kv_cache_seq_rm(ctx, 0, rewind_state.kv_cache_pos, -1);
        embd_inp.erase(embd_inp.begin() + rewind_state.embd_inp_size, embd_inp.end()); // not sure
        n_past = rewind_state.n_past_size;
        n_consumed = rewind_state.n_consumed_size;

However, I have to say that there is something missing/extra here: when testing K-Shift sampler I noticed that initial logits of the first message are different from all later regenerations (those are the same, though). Still not sure what's wrong, but maybe it helps.

int last_n_past = pos_history.back();
int n_tokens_removed = n_past - last_n_past;
llama_kv_cache_seq_rm(ctx, 0, last_n_past, -1);
n_remain += n_tokens_removed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might become a problem with n_predict == -1 (infinite) or -2 (stop at context size), comment from here.

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 4, 2024

@slaren Completely agree with that. A better user experience chat CLI is very desirable. Personally, I'd prefer re-propose the llama-cli for this usage since most users are already familiar with it. More "advanced" (and lesser-known) things like --interactive, --in-prefix, --in-suffix can be moved to a new binary called llama-completion for example.

The server UI is (to be honest) a low hanging fruit. However, I've been always hesitate to get my hand on this part of the project. Even though I've been working on web development more than half of my life, web development to me is always kinda "101 ways to shoot yourself in the foot." - there are just too many choices to do a simple thing on the web. IMO the best way to kick-start this is to throw away all the frontend code and establish a "standard" for that part. Still, I'd love to hear what's your thought on that @ggerganov

@ngxson ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Nov 4, 2024
@ggerganov
Copy link
Member

IMO the best way to kick-start this is to throw away all the frontend code and establish a "standard" for that part. Still, I'd love to hear what's your thought on that @ggerganov

How would throwing it away help in this case? We can have multiple UI implementations since they are well decoupled from the server code.

I agree, it can be a fun and useful project, so if there is will from people to implement a full-fledged chat UI, it would be great.

p.s. I personally would love to write a Dear ImGui app at some point, as I really enjoy making UI's with this library :)

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 4, 2024

How would throwing it away help in this case? We can have multiple UI implementations since they are well decoupled from the server code.

What I mean is that atm we have multiple implementations of the web UI:

  • index.html, the original UI made using preact. It uses the /completions endpoint
  • index-new.html, the "themed" version of index.html. But instead of being a "layer on top", it's actually duplicated version of index.html
  • simplechat, a different version that use /chat/completions and does not use preact (instead, the dev opt-in writing a handmade framework ui.mjs for managing the DOM elements)

So to me, it's kinda a "soupe" to keep all of these. As I said earlier web dev nowadays is kinda "101 ways to shoot yourself in the foot" and sorry to say that the current state of server.cpp web UI reflects exactly that. (I'm limiting my scope to the web UI part only, completely agree that this has nothing to do with server.cpp itself)

In addition, I really think we could archive the same web UI we're having nowadays with much less code. Javascript is a high-level language anyway, so through time there has been (and will be) frameworks allowing us to "write less, do more".

I agree, it can be a fun and useful project, so if there is will from people to implement a full-fledged chat UI, it would be great.

Hah yeah I'm not opposing this idea, but just to give some insights. For desktop apps, the complexity is mostly in the packaging stage and not the development stage. My experience working at the past job is that packaging for windows/mac is easier than linux (most people use Gnome or KDE).

Indeed, I did end up choosing Flutter as desktop UI framework thanks to its easy packaging. Some may say that Electron-based solutions is much easier, but that's only true if the app never touches low-level c++ bind. A built Flutter app is also much lighter than Electron too.

That's just some of my experiences, but I'm also curious to see what you can do with ImGui too!

(Sorry for hijacking this PR for discussing server.cpp 😂 we may need to create a dedicated issue)

@MaggotHATE
Copy link
Contributor

p.s. I personally would love to write a Dear ImGui app at some point, as I really enjoy making UI's with this library :)

AFAIK it still doesn't have text wrap in edit fields, so some internal work would be needed. I still can't make myself do that.

I remember there were C++ libraries for HTML-based UI, this might allow using the same UI for server and desktop apps (since html is already in llama.cpp).

@ngxson
Copy link
Collaborator Author

ngxson commented Nov 4, 2024

I remember there were C++ libraries for HTML-based UI, this might allow using the same UI for server and desktop apps (since html is already in llama.cpp).

I'm aware of this approach. The problem with these c++ web UI library is that their only propose is to generate HTML based on c++ calls. So, in the end they don't provide real benefits, but just a layer on top HTML.

What I'm trying to say is that there are much lightweight approaches. Take VueJS for example, it is easy to learn, has many resources & large community (to be fair, it's a bit heavier than Preact, but easier to work with)

@arch-btw
Copy link
Contributor

arch-btw commented Nov 7, 2024

/dump FILE dump chat content to a file

I've been hoping for this feature for so long!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants