Skip to content

Commit e94dcac

Browse files
authored
[Cherry-Pick][Text Generation] Terminate the inference when kv cache is full (#1447)
* [Fix] Remove erronous LIB.kv_cache input when using external kv cache management (#1337) * initial commit * initial commit * cleanup * cleanup2 * initial commit * initial commit * Needs to be >=
1 parent 39e21d3 commit e94dcac

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

src/deepsparse/transformers/pipelines/text_generation.py

+5
Original file line numberDiff line numberDiff line change
@@ -829,6 +829,11 @@ def engine_forward(
829829
generated_tokens.append(token)
830830
generated_logits.append(logits)
831831

832+
if session.total_num_processed_tokens >= session.capacity:
833+
# if the kv cache is full, stop generation
834+
finished_reason.append(FinishReason.CAPACITY)
835+
break
836+
832837
if (
833838
token == self.tokenizer.eos_token_id
834839
and not self.force_max_tokens

0 commit comments

Comments
 (0)