You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update examples from ggml to gguf and add hw-accel note for Web Server (ggml-org#688)
* Examples from ggml to gguf
* Use gguf file extension
Update examples to use filenames with gguf extension (e.g. llama-model.gguf).
---------
Co-authored-by: Andrei <[email protected]>
"text": "Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto.",
@@ -136,15 +136,15 @@ The context window of the Llama models determines the maximum number of tokens t
136
136
For instance, if you want to work with larger contexts, you can expand the context window by setting the n_ctx parameter when initializing the Llama object:
[Docker on termux (requires root)](https://gist.github.com/FreddieOliveira/efe850df7ff3951cb62d74bd770dce27) is currently the only known way to run this on phones, see [termux support issue](https://github.com/abetlen/llama-cpp-python/issues/389)
172
179
@@ -183,7 +190,7 @@ Below is a short example demonstrating how to use the low-level API to tokenize
183
190
>>> llama_cpp.llama_backend_init(numa=False) # Must be called once at the start of each program
0 commit comments