Skip to content

Commit 9cdfe93

Browse files
Complete removal or f16_kv, add offload_kqv field
This addresses two issues: - abetlen#995 which just requests to add the KV cache offloading param - abetlen#1006 a NULL ptr exception when using the embeddings (introduced by leaving f16_kv in the fields struct)
1 parent 37da8e8 commit 9cdfe93

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

Diff for: llama_cpp/llama_cpp.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -432,9 +432,9 @@ class llama_context_params(Structure):
432432
type_k (int): data type for K cache
433433
type_v (int): data type for V cache
434434
mul_mat_q (bool): if true, use experimental mul_mat_q kernels (DEPRECATED - always true)
435-
f16_kv (bool): use fp16 for KV cache, fp32 otherwise
436435
logits_all (bool): the llama_eval() call computes all logits, not just the last one (DEPRECATED - set llama_batch.logits instead)
437-
embedding (bool): embedding mode only"""
436+
embedding (bool): embedding mode only
437+
offload_kqv (bool): whether to offload the KQV ops (including the KV cache) to GPU"""
438438
_fields_ = [
439439
("seed", c_uint32),
440440
("n_ctx", c_uint32),
@@ -452,9 +452,9 @@ class llama_context_params(Structure):
452452
("type_k", c_int),
453453
("type_v", c_int),
454454
("mul_mat_q", c_bool),
455-
("f16_kv", c_bool),
456455
("logits_all", c_bool),
457456
("embedding", c_bool),
457+
("offload_kqv", c_bool),
458458
]
459459

460460

0 commit comments

Comments
 (0)