-
Notifications
You must be signed in to change notification settings - Fork 67
Simulation code for llm inference gateway #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Welcome @kaushikmitr! |
any idea to make this reviewable on github? the file is way too big with image blobs taking up so much space |
Yes, this is more for viewing (https://github.com/kubernetes-sigs/llm-instance-gateway/blob/a6ee825c4a91d4a5c7915d82b6a6991123347c7b/simulations/llm_inference_gateway_simulation.ipynb) but hard to comment on. I am working on making this into a regular python script. |
added python library llm_ig_simulation folder |
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra lines (here and other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
When you get a chance, can we have readme file describing how to run this? |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, kaushikmitr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix kustomize envs
The notebook implements the following components:
Prefill and Decode Latency Modeling as a Function of Batch Sizes:
Continuous Batching Algorithm Simulation:
LLM Inference Gateway Simulation with Different Routing Strategies:
Evaluating Different Algorithms with Configurable Workloads: