-
Notifications
You must be signed in to change notification settings - Fork 11.5k
main : add new feature: special commands #10145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi! Thank you for this PR, should be very useful (I did something similar in my own The obvious ideas for this are |
llamafile added similar commands in a recent commit (the /undo, /push and /pop commands are pretty similar to the /regenerate command proposed by MaggotHATE). |
I played around with this idea today (without knowing that llamafile already had this kind of feature) Added 2 more commands:
Still, not sure if my implementation is ok. I still feel like it's a bit hacky to add |
I think it would be good to have a separate example just for chat, focused on usability, and without all the complexity of main. Maybe repurpose main for completions only. It would make the code simpler overall and easier to work with, and then it would make sense to extend it with all kinds of features to make it a better chat program. On a side note, I also think that it would be very desirable to move away from the "ozempic" (that's the way it was described originally) default llama-server page, and instead bundle a full featured, easy to use, chat interface with the server. I don't see any reason to force users to install 3rd party applications to have a reasonable UX with llama.cpp. |
n_remain += n_tokens_removed; | ||
is_interacting = false; | ||
// we intentionally do not reset the sampling, so new message will be more diverse | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember having problems with regeneration without adjusting prev
. I ended up capturing the entire ring buffer and just restoring it. I did the same to all other values, capturing state instead of calculating. So far I have stable results over 20 regenerations (my usual bulk testing) with the following:
restore_smpl(); // rewinds gsmpl->prev
llama_kv_cache_seq_rm(ctx, 0, rewind_state.kv_cache_pos, -1);
embd_inp.erase(embd_inp.begin() + rewind_state.embd_inp_size, embd_inp.end()); // not sure
n_past = rewind_state.n_past_size;
n_consumed = rewind_state.n_consumed_size;
However, I have to say that there is something missing/extra here: when testing K-Shift
sampler I noticed that initial logits of the first message are different from all later regenerations (those are the same, though). Still not sure what's wrong, but maybe it helps.
int last_n_past = pos_history.back(); | ||
int n_tokens_removed = n_past - last_n_past; | ||
llama_kv_cache_seq_rm(ctx, 0, last_n_past, -1); | ||
n_remain += n_tokens_removed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might become a problem with n_predict == -1 (infinite) or -2 (stop at context size)
, comment from here.
@slaren Completely agree with that. A better user experience chat CLI is very desirable. Personally, I'd prefer re-propose the The server UI is (to be honest) a low hanging fruit. However, I've been always hesitate to get my hand on this part of the project. Even though I've been working on web development more than half of my life, web development to me is always kinda "101 ways to shoot yourself in the foot." - there are just too many choices to do a simple thing on the web. IMO the best way to kick-start this is to throw away all the frontend code and establish a "standard" for that part. Still, I'd love to hear what's your thought on that @ggerganov |
How would throwing it away help in this case? We can have multiple UI implementations since they are well decoupled from the server code. I agree, it can be a fun and useful project, so if there is will from people to implement a full-fledged chat UI, it would be great. p.s. I personally would love to write a Dear ImGui app at some point, as I really enjoy making UI's with this library :) |
What I mean is that atm we have multiple implementations of the web UI:
So to me, it's kinda a "soupe" to keep all of these. As I said earlier web dev nowadays is kinda "101 ways to shoot yourself in the foot" and sorry to say that the current state of server.cpp web UI reflects exactly that. (I'm limiting my scope to the web UI part only, completely agree that this has nothing to do with server.cpp itself) In addition, I really think we could archive the same web UI we're having nowadays with much less code. Javascript is a high-level language anyway, so through time there has been (and will be) frameworks allowing us to "write less, do more".
Hah yeah I'm not opposing this idea, but just to give some insights. For desktop apps, the complexity is mostly in the packaging stage and not the development stage. My experience working at the past job is that packaging for windows/mac is easier than linux (most people use Gnome or KDE). Indeed, I did end up choosing Flutter as desktop UI framework thanks to its easy packaging. Some may say that Electron-based solutions is much easier, but that's only true if the app never touches low-level c++ bind. A built Flutter app is also much lighter than Electron too. That's just some of my experiences, but I'm also curious to see what you can do with ImGui too! (Sorry for hijacking this PR for discussing server.cpp 😂 we may need to create a dedicated issue) |
AFAIK it still doesn't have text wrap in edit fields, so some internal work would be needed. I still can't make myself do that. I remember there were C++ libraries for HTML-based UI, this might allow using the same UI for server and desktop apps (since html is already in llama.cpp). |
I'm aware of this approach. The problem with these c++ web UI library is that their only propose is to generate HTML based on c++ calls. So, in the end they don't provide real benefits, but just a layer on top HTML. What I'm trying to say is that there are much lightweight approaches. Take VueJS for example, it is easy to learn, has many resources & large community (to be fair, it's a bit heavier than Preact, but easier to work with) |
I've been hoping for this feature for so long! |
Motivation
This is kinda a fun hack. I don't even know if it's useful for anyone.
The idea is to add slack-like (or discord-like) command support to do various things without having the re-launch the app:
Real-life use case
For example, I'm studying about Hồ Xuân Hương, a vietnamese poet. The info can be found in this Wikipedia article.
I downloaded the article above to a file named
wiki.txt
Then I can ask many questions, until I feel like I want to go back to checkpoint
hi.bin
that I saved earlierThen I can continue to ask more questions.