-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Extend llama_kv_cache_seq_rm to allow matching any sequence #3843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend llama_kv_cache_seq_rm to allow matching any sequence #3843
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove llama_kv_cache_tokens_rm
and add llama_kv_cache_clear
and extend the llama_kv_cache_seq_rm
as proposed
Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
How about this? Tested, seems to work. Pretty hard change for even me to screw up. |
@@ -1492,8 +1487,14 @@ static void llama_kv_cache_seq_rm( | |||
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max(); | |||
|
|||
for (uint32_t i = 0; i < cache.size; ++i) { | |||
if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) { | |||
cache.cells[i].seq_id.erase(seq_id); | |||
if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not worth the complexity, but we could optimize to:
static void llama_kv_cache_seq_rm(
struct llama_kv_cache & cache,
llama_seq_id seq_id,
llama_pos p0,
llama_pos p1) {
uint32_t new_head = cache.size;
if (p0 < 0) p0 = 0;
if (p1 < 0) p1 = std::numeric_limits<llama_pos>::max();
if (seq_id < 0) {
if (p0 == 0 && p1 >= cache.size) {
llama_kv_cache_clear(cache);
return;
}
for (uint32_t i = 0; i < cache.size; ++i) {
if (cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
cache.cells[i].seq_id.clear();
cache.cells[i].pos = -1;
if (new_head == cache.size) new_head = i;
}
}
} else {
for (uint32_t i = 0; i < cache.size; ++i) {
if (cache.cells[i].has_seq_id(seq_id) && cache.cells[i].pos >= p0 && cache.cells[i].pos < p1) {
cache.cells[i].seq_id.erase(seq_id);
if (cache.cells[i].seq_id.empty()) {
cache.cells[i].pos = -1;
if (new_head == cache.size) new_head = i;
}
}
}
}
// If we freed up a slot, set head to it so searching can start there.
if (new_head != cache.size) cache.head = new_head;
}
This avoids checking if seq_id
< 0 each iteration, but a single int test probably wouldn't be noticeable even for huge KV caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, either way would be fine. If you change it directly merge or you just merge as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you don't have a preference, I'll just leave it as is. I don't think it's worth the added complexity.
…#3843) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
…#3843) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality
Since
llama_seq_id
allows for negative values, this extendsllama_kv_cache_seq_rm
to use sequence ids < 0 to match any sequence. Based on #3840 it doesn't seem likellama_kv_cache_token_rm
really can be used for anything other than just clearing the kv cache. Some of the existing uses seem incorrect as well.This pull is mergeable but incomplete. If it's on the right track, I'll just replace the existing calls to
llama_kv_cache_token_rm
that just clear the cache (llama_kv_cache_tokens_rm(ctx, -1, -1)
) withllama_kv_cache_seq_rm(ctx, -1, -1, -1)
and ones likellama_kv_cache_tokens_rm(ctx, n_matching_session_tokens, -1)
which seem to actually want a position rather than cell index withllama_kv_cache_seq_rm(ctx, -1, n_matching_session_tokens, -1)
.llama_kv_cache_tokens_rm
could possibly be removed (I can't see how one would actually use it, where would knowing a kv cell index be meaningful? Maybe something like kv cache defragging if it ever gets added?). Could possibly also add allama_kv_cache_clear
function to clear the KV cache a bit more efficiently than using the newllama_kv_cache_seq_rm
method.Closes #3840