You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
llama : save and restore kv cache for single seq id (#6341)
* llama : save and restore kv cache for single seq id
* remove trailing whitespace
* respond error in case there's no space in the kv cache
* add kv seq save restore to test case
* add --slot-save-path arg to enable save restore and restrict save location
* Returning 0 for some cases, instead of asserting.
* cleanup error cases
* rename sequence state functions
* rename state get set functions
* add previous function names back in with DEPRECATED notice
* update doc
* adjust endpoints to preferred style
* fix restoring zero cell count
* handle seq rm return value
* unused param
* keep in the size check
* fix return types
* add server test case for slot save restore
* cleanup
* add cake
* cleanup style
* add special
* removing a whole sequence never fails
* move sequence state file functionality from server to llama to match session api and add version tags
* catch exceptions on save as well
* error log messages
* check types for stricter restore
* update server doc
* readme : update API changes date
* strict filename validation
* move include, reject bom as well
* also reject empty filename
* reject whitespace and trailing dot
---------
Co-authored-by: Martin Evans <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+1
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,7 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
10
10
11
11
### Recent API changes
12
12
13
+
-[2024 Apr 4] State and session file functions reorganized under `llama_state_*`https://github.com/ggerganov/llama.cpp/pull/6341
13
14
-[2024 Mar 26] Logits and embeddings API updated for compactness https://github.com/ggerganov/llama.cpp/pull/6122
14
15
-[2024 Mar 13] Add `llama_synchronize()` + `llama_context_params.n_ubatch`https://github.com/ggerganov/llama.cpp/pull/6017
15
16
-[2024 Mar 8]`llama_kv_cache_seq_rm()` returns a `bool` instead of `void`, and new `llama_n_seq_max()` returns the upper limit of acceptable `seq_id` in batches (relevant when dealing with multiple sequences) https://github.com/ggerganov/llama.cpp/pull/5328
-`--slot-save-path PATH`: Specifies the path where the state of slots (the prompt cache) can be stored. If not provided, the slot management endpoints will be disabled.
60
61
-`--chat-template JINJA_TEMPLATE`: Set custom jinja chat template. This parameter accepts a string, not a file name. Default: template taken from model's metadata. We only support [some pre-defined templates](https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template)
61
62
-`--log-disable`: Output logs to stdout only, not to `llama.log`. Default: enabled
62
63
-`--log-format FORMAT`: Define the log output to FORMAT: json or text Default: `json`
@@ -517,6 +518,57 @@ Available metrics:
517
518
-`llamacpp:requests_processing`: Number of requests processing.
518
519
-`llamacpp:requests_deferred`: Number of requests deferred.
519
520
521
+
-**POST**`/slots/{id_slot}?action=save`: Save the prompt cache of the specified slot to a file.
522
+
523
+
*Options:*
524
+
525
+
`filename`: Name of the file to save the slot's prompt cache. The file will be saved in the directory specified by the `--slot-save-path` server parameter.
526
+
527
+
### Result JSON
528
+
529
+
```json
530
+
{
531
+
"id_slot": 0,
532
+
"filename": "slot_save_file.bin",
533
+
"n_saved": 1745,
534
+
"n_written": 14309796,
535
+
"timings": {
536
+
"save_ms": 49.865
537
+
}
538
+
}
539
+
```
540
+
541
+
-**POST**`/slots/{id_slot}?action=restore`: Restore the prompt cache of the specified slot from a file.
542
+
543
+
*Options:*
544
+
545
+
`filename`: Name of the file to restore the slot's prompt cache from. The file should be located in the directory specified by the `--slot-save-path` server parameter.
546
+
547
+
### Result JSON
548
+
549
+
```json
550
+
{
551
+
"id_slot": 0,
552
+
"filename": "slot_save_file.bin",
553
+
"n_restored": 1745,
554
+
"n_read": 14309796,
555
+
"timings": {
556
+
"restore_ms": 42.937
557
+
}
558
+
}
559
+
```
560
+
561
+
-**POST**`/slots/{id_slot}?action=erase`: Erase the prompt cache of the specified slot.
0 commit comments