Skip to content

Simulation code for llm inference gateway #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 16, 2024

Conversation

kaushikmitr
Copy link
Contributor

The notebook implements the following components:

  1. Prefill and Decode Latency Modeling as a Function of Batch Sizes:

    • The prefill and decode latencies are modeled based on batch size, where the total latency is influenced by the number of tokens in a batch.
    • The model incorporates different latency constants, providing a mathematical representation of how batch sizes impact the prefill and decode processes.
    • Latency is calculated using a formula that sums up token lengths across the batch, giving insight into the relationship between batch size and latency performance.
  2. Continuous Batching Algorithm Simulation:

    • This component simulates a continuous batching mechanism where scheduling decisions are made dynamically before each forward pass as implemented in vLLM.
  3. LLM Inference Gateway Simulation with Different Routing Strategies:

    • The simulation models an LLM inference gateway, where requests are routed to different model servers based on various strategies.
    • Multiple routing algorithms are explored to optimize how requests are handled, balancing latency, throughput, and resource utilization.
  4. Evaluating Different Algorithms with Configurable Workloads:

    • The notebook enables the evaluation of various algorithms under different configurable workloads, allowing for flexible testing of different conditions.
    • Users can configure workloads by modifying factors such as request sizes, batch sizes, token lengths, and latency requirements.

Copy link

linux-foundation-easycla bot commented Oct 3, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and kfswain October 3, 2024 19:02
@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 3, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @kaushikmitr!

It looks like this is your first PR to kubernetes-sigs/llm-instance-gateway 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/llm-instance-gateway has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 3, 2024
@ahg-g
Copy link
Contributor

ahg-g commented Oct 4, 2024

any idea to make this reviewable on github? the file is way too big with image blobs taking up so much space

@kaushikmitr
Copy link
Contributor Author

any idea to make this reviewable on github? the file is way too big with image blobs taking up so much space

Yes, this is more for viewing (https://github.com/kubernetes-sigs/llm-instance-gateway/blob/a6ee825c4a91d4a5c7915d82b6a6991123347c7b/simulations/llm_inference_gateway_simulation.ipynb) but hard to comment on. I am working on making this into a regular python script.

@kaushikmitr
Copy link
Contributor Author

added python library llm_ig_simulation folder

Comment on lines 1 to 4




Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra lines (here and other places)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@ahg-g
Copy link
Contributor

ahg-g commented Oct 28, 2024

When you get a chance, can we have readme file describing how to run this?

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@ahg-g
Copy link
Contributor

ahg-g commented Nov 16, 2024

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit 300176b into kubernetes-sigs:main Nov 16, 2024
2 checks passed
shaneutt pushed a commit to shaneutt/gateway-api-inference-extension that referenced this pull request Apr 17, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix kustomize envs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants