We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent b17af98 commit 6007a75Copy full SHA for 6007a75
src/deepsparse/v2/text_generation/process_inputs.py
@@ -121,5 +121,8 @@ def run(
121
frequency_penalty=generation_config.repetition_penalty,
122
)
123
124
+ # TODO: move this step to prep_for_prefill and add attention mask to the output
125
+ # this will allow us to split/join more easily when processing multiple prompts
126
+ # in parallel
127
tokens = input_ids[attention_mask.nonzero()].tolist()
128
return {"tokens": tokens}, inference_state_update
0 commit comments