-
I know this might be pretty basic but I am not able to find a solution. I am using llama server to run the llama-3 model using below command.
Have tried below 2 solutions.
Note: I also tried running the server with --prompt-cache-all and --prompt-cache-ro but in response I am not able to see it reading from cache but if I use api and pass cache_prompt to true I can see cache being used and the response is much faster. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
in chat completion |
Beta Was this translation helpful? Give feedback.
in chat completion
extra_body={"cache_prompt": True}