Skip to content

Commit 05ad311

Browse files
authored
Update llama.cpp build instruction (#1678)
1 parent 308baa8 commit 05ad311

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

docs/hub/gguf-llamacpp.md

+7-5
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,23 @@ Step 1: Clone llama.cpp from GitHub.
2121
git clone https://github.com/ggerganov/llama.cpp
2222
```
2323

24-
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
24+
Step 2: Move into the llama.cpp folder and build it. You can also add hardware-specific flags (for ex: `-DGGML_CUDA=1` for Nvidia GPUs).
2525

2626
```
27-
cd llama.cpp && LLAMA_CURL=1 make
27+
cd llama.cpp
28+
cmake -B build # optionally, add -DGGML_CUDA=ON to activate CUDA
29+
cmake --build build --config Release
2830
```
2931

32+
Note: for other hardware support (for ex: AMD ROCm, Intel SYCL), please refer to [llama.cpp's build guide](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md)
33+
3034
Once installed, you can use the `llama-cli` or `llama-server` as follows:
3135

3236
```bash
3337
llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0
3438
```
3539

36-
Note: You can remove `-cnv` to run the CLI in chat completion mode.
40+
Note: You can explicitly add `-no-cnv` to run the CLI in raw completion mode (non-chat mode).
3741

3842
Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server:
3943

@@ -62,5 +66,3 @@ curl http://localhost:8080/v1/chat/completions \
6266
```
6367

6468
Replace `-hf` with any valid Hugging Face hub repo name - off you go! 🦙
65-
66-
Note: Remember to `build` llama.cpp with `LLAMA_CURL=1` :)

0 commit comments

Comments
 (0)