Description
Title:
Issue with Passing Custom Arguments to llama_cpp.server
in Docker
Issue Description:
Hello abetlen
,
I've been trying to use your Docker image ghcr.io/abetlen/llama-cpp-python:v0.2.24
for llama_cpp.server
, and I encountered some difficulties when attempting to pass custom arguments (--n_gpu_layers 81
, --chat_format chatml
, --use_mlock False
) to the server through Docker.
Steps to Reproduce:
-
Pull the Docker image:
docker pull ghcr.io/abetlen/llama-cpp-python:v0.2.24
-
Run the container with custom arguments:
docker run --rm -it -p 8000:8000 \ -v /home/jaredquek/text-generation-webui/models:/models \ -e MODEL=/models/tulu-2-dpo-70b.Q5_K_M.gguf \ --entrypoint uvicorn \ ghcr.io/abetlen/llama-cpp-python:v0.2.24 \ --factory llama_cpp.server.app:create_app --host 0.0.0.0 --port 8000 --n_gpu_layers 81 --chat_format chatml --use_mlock False
This results in an error:
Error: No such option: --n_gpu_layers
.
Expected Behavior:
I expected to be able to pass these arguments to the llama_cpp.server
application inside the Docker container.
Actual Behavior:
The uvicorn
command does not recognize these arguments as it's designed for the ASGI server, not the llama_cpp.server
application.
Potential Solutions:
- Modify the Dockerfile or application configuration to accept these arguments.
- Provide guidance in Readme on how to correctly pass additional arguments or configure the server with these settings.
I would appreciate any assistance or guidance you could provide on this issue.
Thank you for your time and for maintaining this project.
Best regards.