Skip to content

Commit 40e07a6

Browse files
authored
llama.cpp : add documentation about rope_freq_base and scale values (ggml-org#3401)
* llama.cpp : add documentation about rope_freq_base and scale values * add notice to hot topics
1 parent bc34dd4 commit 40e07a6

File tree

2 files changed

+6
-5
lines changed

2 files changed

+6
-5
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++
1111

1212
### Hot topics
1313

14+
- ‼️ Breaking change: `rope_freq_base` and `rope_freq_scale` must be set to zero to use the model default values: [#3401](https://github.com/ggerganov/llama.cpp/pull/3401)
1415
- Parallel decoding + continuous batching support added: [#3228](https://github.com/ggerganov/llama.cpp/pull/3228) \
1516
**Devs should become familiar with the new API**
1617
- Local Falcon 180B inference on Mac Studio

llama.h

+5-5
Original file line numberDiff line numberDiff line change
@@ -167,18 +167,18 @@ extern "C" {
167167

168168
struct llama_context_params {
169169
uint32_t seed; // RNG seed, -1 for random
170-
uint32_t n_ctx; // text context
171-
uint32_t n_batch; // prompt processing batch size
170+
uint32_t n_ctx; // text context, 0 = from model
171+
uint32_t n_batch; // prompt processing maximum batch size
172172
uint32_t n_threads; // number of threads to use for generation
173173
uint32_t n_threads_batch; // number of threads to use for batch processing
174174

175175
// ref: https://github.com/ggerganov/llama.cpp/pull/2054
176-
float rope_freq_base; // RoPE base frequency
177-
float rope_freq_scale; // RoPE frequency scaling factor
176+
float rope_freq_base; // RoPE base frequency, 0 = from model
177+
float rope_freq_scale; // RoPE frequency scaling factor, 0 = from model
178178

179179
// Keep the booleans together to avoid misalignment during copy-by-value.
180180
bool mul_mat_q; // if true, use experimental mul_mat_q kernels
181-
bool f16_kv; // use fp16 for KV cache
181+
bool f16_kv; // use fp16 for KV cache, fp32 otherwise
182182
bool logits_all; // the llama_eval() call computes all logits, not just the last one
183183
bool embedding; // embedding mode only
184184
};

0 commit comments

Comments
 (0)