Skip to content

Bug: server is broken by Mistral Nemo commit #8631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
auriocus opened this issue Jul 22, 2024 · 5 comments · Fixed by #8646
Closed

Bug: server is broken by Mistral Nemo commit #8631

auriocus opened this issue Jul 22, 2024 · 5 comments · Fixed by #8646
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)

Comments

@auriocus
Copy link

What happened?

I run the serve with:
llama-server -m ~/Downloads/Meta-Llama-3-8B-Instruct-v2.Q8_0.gguf -c 4096 --no-mmap

After commit 50e0535 nothing happens when a question is posted in the web interface. The CPU is not used. I've used git bisect to verify it comes from that commit.

Name and Version

 ./llama-cli --version
version: 3438 (6f11a83e)
built with gcc-10 (SUSE Linux) 10.4.0 for x86_64-suse-linux

What operating system are you seeing the problem on?

No response

Relevant log output

No response

@auriocus auriocus added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jul 22, 2024
@auriocus
Copy link
Author

Not sure anymore that the issue is inthe commit mentioned above; a second bisect points to a different commit:

gollwi01@vier:~/Programmieren/llama.cpp> git bisect bad
940362224d20e35f13aa5fd34a0d937ae57bdf7d is the first bad commit
commit 940362224d20e35f13aa5fd34a0d937ae57bdf7d
Author: Michael Coppola <[email protected]>
Date:   Sat Jul 20 09:43:51 2024 -0400

    llama : add support for Tekken pre-tokenizer (#8579)
    
    * llama : Added support for Tekken pre-tokenizer (#8577)
    
    Removed uneeded `vocab.tokenizer_clean_spaces` assignment
    
    * llama : fix order of pre-tokenizers
    
    * * Tekken pre-tokenizer no longer uses clean_up_tokenization_spaces
    * Updated chkhsh for Tekken tokenizer
    
    ---------
    
    Co-authored-by: Georgi Gerganov <[email protected]>

 convert_hf_to_gguf.py        |  3 +++
 convert_hf_to_gguf_update.py |  1 +
 include/llama.h              |  1 +
 src/llama.cpp                | 13 +++++++++++++
 4 files changed, 18 insertions(+)

Maybe it has something to do with cleaning up between the compilation runs. In any case, going back to f17f39f fixes the issue and going forward to HEAD introduces it.

@ScarletEmerald
Copy link

ScarletEmerald commented Jul 23, 2024

I am also having this problem. Bisecting was a huge pain since there seems to be some state preserved between runs, which made the results inconsistent. However, I am now convinced that the problem was introduced in 0d2c732

That commit breaks both the old and new web UI for llama-server in Firefox on Linux. Clicking Start or Send in the UI does not cause inferencing to begin. The web UI continues to work in Chromium on Linux.

@ggerganov
Copy link
Member

Yes, I also observed this issue and I think that clicking on "Reset all to default" button in the UI resolves it

@auriocus
Copy link
Author

I am also having this problem. Bisecting was a huge pain since there seems to be some state preserved between runs, which made the results inconsistent. However, I am now convinced that the problem was introduced in 0d2c732
That commit breaks both the old and new web UI for llama-server in Firefox on Linux. Clicking Start or Send in the UI does not cause inferencing to begin. The web UI continues to work in Chromium on Linux.

That makes way more sense to me, thank you! I also can confirm that there is no problem with Chromium. Hitting the "reset all" button did not work, but updating Firefox did work. We are using an extended support release (115.13.0esr that comes with openSuSE), updating to 128 fixed the problem. So it is more likely a Firefox bug. Also, when observing the traffic in the Firefox Network analyzer, hitting the button in the buggy version does not show any network connection (whereas it does POST to http://127.0.0.1:8080/completion in the working version)

@auriocus
Copy link
Author

I can confirm that #8646 fixes the issue for me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants