Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

KerfuffleV2 · 2023-10-29T03:16:08Z

Since llama_seq_id allows for negative values, this extends llama_kv_cache_seq_rm to use sequence ids < 0 to match any sequence. Based on #3840 it doesn't seem like llama_kv_cache_token_rm really can be used for anything other than just clearing the kv cache. Some of the existing uses seem incorrect as well.

This pull is mergeable but incomplete. If it's on the right track, I'll just replace the existing calls to llama_kv_cache_token_rm that just clear the cache (llama_kv_cache_tokens_rm(ctx, -1, -1)) with llama_kv_cache_seq_rm(ctx, -1, -1, -1) and ones like llama_kv_cache_tokens_rm(ctx, n_matching_session_tokens, -1) which seem to actually want a position rather than cell index with llama_kv_cache_seq_rm(ctx, -1, n_matching_session_tokens, -1).

llama_kv_cache_tokens_rm could possibly be removed (I can't see how one would actually use it, where would knowing a kv cell index be meaningful? Maybe something like kv cache defragging if it ever gets added?). Could possibly also add a llama_kv_cache_clear function to clear the KV cache a bit more efficiently than using the new llama_kv_cache_seq_rm method.

Closes #3840

ggerganov

Let's remove llama_kv_cache_tokens_rm and add llama_kv_cache_clear and extend the llama_kv_cache_seq_rm as proposed

Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality

KerfuffleV2 · 2023-10-29T17:03:38Z

How about this? Tested, seems to work. Pretty hard change for even me to screw up.

KerfuffleV2 · 2023-10-29T17:18:51Z

llama.cpp

@@ -1492,8 +1487,14 @@ static void llama_kv_cache_seq_rm(
    if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();

    for (uint32_t i = 0; i < cache.size; ++i) {
-        if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
-            cache.cells[i].seq_id.erase(seq_id);
+        if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {


Probably not worth the complexity, but we could optimize to:

static void llama_kv_cache_seq_rm( struct llama_kv_cache & cache, llama_seq_id seq_id, llama_pos p0, llama_pos p1) { uint32_t new_head = cache.size; if (p0 < 0) p0 = 0; if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max(); if (seq_id < 0) { if (p0 == 0 && p1 >= cache.size) { llama_kv_cache_clear(cache); return; } for (uint32_t i = 0; i < cache.size; ++i) { if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) { cache.cells[i].seq_id.clear(); cache.cells[i].pos = -1; if (new_head == cache.size) new_head = i; } } } else { for (uint32_t i = 0; i < cache.size; ++i) { if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) { cache.cells[i].seq_id.erase(seq_id); if (cache.cells[i].seq_id.empty()) { cache.cells[i].pos = -1; if (new_head == cache.size) new_head = i; } } } } // If we freed up a slot, set head to it so searching can start there. if (new_head != cache.size) cache.head = new_head; }

This avoids checking if seq_id < 0 each iteration, but a single int test probably wouldn't be noticeable even for huge KV caches.

Yup, either way would be fine. If you change it directly merge or you just merge as it is

Since you don't have a preference, I'll just leave it as is. I don't think it's worth the added complexity.

…#3843) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality

* ggml-org/llama.cpp#3843

…#3843) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality

* ggml-org/llama.cpp#3843

Extend llama_kv_cache_seq_rm to allow matichng any sequence

ebd4b91

KerfuffleV2 mentioned this pull request Oct 29, 2023

llama_kv_cache_tokens_rm functioning on index and not position #3840

Closed

ggerganov approved these changes Oct 29, 2023

View reviewed changes

Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear

bed3f17

Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality

KerfuffleV2 commented Oct 29, 2023

View reviewed changes

ggerganov approved these changes Oct 29, 2023

View reviewed changes

KerfuffleV2 merged commit 6e08281 into ggml-org:master Oct 29, 2023

KerfuffleV2 deleted the feat-kv_seq_rm_wildcard branch November 17, 2023 03:12

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023

Replace fn to remove seq of kv cache w kv cache clear fn

18dadad

* ggml-org/llama.cpp#3843

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023

Replace fn to remove seq of kv cache w kv cache clear fn

2c0c767

* ggml-org/llama.cpp#3843

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

KerfuffleV2 commented Oct 29, 2023 •

edited

Loading

ggerganov left a comment

KerfuffleV2 commented Oct 29, 2023

KerfuffleV2 Oct 29, 2023

ggerganov Oct 29, 2023

KerfuffleV2 Oct 29, 2023

Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

Extend llama_kv_cache_seq_rm to allow matching any sequence #3843

Conversation

KerfuffleV2 commented Oct 29, 2023 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

KerfuffleV2 commented Oct 29, 2023

KerfuffleV2 Oct 29, 2023

Choose a reason for hiding this comment

ggerganov Oct 29, 2023

Choose a reason for hiding this comment

KerfuffleV2 Oct 29, 2023

Choose a reason for hiding this comment

KerfuffleV2 commented Oct 29, 2023 •

edited

Loading