Running multiple tiny models in parallel on a single GPU #2017
Unanswered
abarai-lanl
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a Nvidia Tesla GPU with 32GB VRAM. I can instantiate 10 instances of llama-cpp-python on that single GPU with Qwen3-0.6B on different terminal sessions. My question is, do these models run in parallel if the
create_chat_completion
is invoked in 10 concurrently in separate terminal instances?Beta Was this translation helpful? Give feedback.
All reactions