Cuda + Docker is generating only infinite text. #12328
Unanswered
Wandering-Magi
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
OS: Arch Linux
Build: Docker + llama.cpp:full-cuda
Built from this guide.
I just started messing around with AI this week, so forgive me for not already knowing all of the words.
I got the basic build working the other night, and got it running with llama-cli. The generation speed was pretty slow, so I wanted to use graphic acceleration so I can get some real speed. But is was working, so I know it's possible.
As usual, Nvidia on Arch continues to be the bane of my existence, but that's neither here nor there. I set up CUDA, installed the toolkit, installed docker, got everything configured and working. A fun learning experience.
If I run it using the example text, it generates text about making a website. Cool, works. The catch is getting the damn thing to pause for user input. Which the example won't do, but a few extra commands should change that.
This is the current command I'm using. All I'm trying to do right now is to get it into conversation mode.
For the record, I have tried both
-p
and-sys
, just to see if it would help at all. Same problem.You can see at the end there, I'm using ctrl+c to interject, but it just won't stop. I have tried this at various lengths into the generation. After 3 calls it interrupts, and I go and use
kill N
to shut it down.If I try to run it in
--server --port 8080
mode, I can't access the server from my browser athttp://127.0.0.1:8080
. I have also tried--host 0.0.0.0
as I have found on other threads, but that doesn't work either.In short, I'm at the end of my ability to figure my way through this. I feel like I'm one step away from figuring this out, but I'm facing a niche case of a new and rapidly developing field.
Beta Was this translation helpful? Give feedback.
All reactions