Simulation code for llm inference gateway #15

kaushikmitr · 2024-10-03T19:02:15Z

The notebook implements the following components:

Prefill and Decode Latency Modeling as a Function of Batch Sizes:
- The prefill and decode latencies are modeled based on batch size, where the total latency is influenced by the number of tokens in a batch.
- The model incorporates different latency constants, providing a mathematical representation of how batch sizes impact the prefill and decode processes.
- Latency is calculated using a formula that sums up token lengths across the batch, giving insight into the relationship between batch size and latency performance.
Continuous Batching Algorithm Simulation:
- This component simulates a continuous batching mechanism where scheduling decisions are made dynamically before each forward pass as implemented in vLLM.
LLM Inference Gateway Simulation with Different Routing Strategies:
- The simulation models an LLM inference gateway, where requests are routed to different model servers based on various strategies.
- Multiple routing algorithms are explored to optimize how requests are handled, balancing latency, throughput, and resource utilization.
Evaluating Different Algorithms with Configurable Workloads:
- The notebook enables the evaluation of various algorithms under different configurable workloads, allowing for flexible testing of different conditions.
- Users can configure workloads by modifying factors such as request sizes, batch sizes, token lengths, and latency requirements.

linux-foundation-easycla · 2024-10-03T19:02:19Z

The committers listed above are authorized under a signed CLA.

✅ login: kaushikmitr (33f20a9, d793532, 9bc831c, cf7d0b3, a69db6c, 8e94a9c, 58d2c77, a6ee825, b9d12e3, 1fdb7b5, 1e9e3cc, 5d48994)

k8s-ci-robot · 2024-10-03T19:02:26Z

Welcome @kaushikmitr!

It looks like this is your first PR to kubernetes-sigs/llm-instance-gateway 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/llm-instance-gateway has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

ahg-g · 2024-10-04T04:51:55Z

any idea to make this reviewable on github? the file is way too big with image blobs taking up so much space

kaushikmitr · 2024-10-04T17:50:29Z

any idea to make this reviewable on github? the file is way too big with image blobs taking up so much space

Yes, this is more for viewing (https://github.com/kubernetes-sigs/llm-instance-gateway/blob/a6ee825c4a91d4a5c7915d82b6a6991123347c7b/simulations/llm_inference_gateway_simulation.ipynb) but hard to comment on. I am working on making this into a regular python script.

kaushikmitr · 2024-10-05T01:59:23Z

added python library llm_ig_simulation folder

terrytangyuan · 2024-10-10T17:26:03Z

simulations/llm_ig_simulation/src/benchmark_one_server.py

+
+
+
+


Remove extra lines (here and other places)

ahg-g · 2024-10-28T17:14:35Z

When you get a chance, can we have readme file describing how to run this?

ahg-g · 2024-11-16T19:32:28Z

/lgtm
/approve

k8s-ci-robot · 2024-11-16T19:32:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, kaushikmitr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Fix kustomize envs

kaushikmitr added 2 commits October 3, 2024 18:48

add simulation ipython notebook for llm inference gateway

a69db6c

rename file

Loading
Loading status checks…

a6ee825

k8s-ci-robot requested review from ahg-g and kfswain October 3, 2024 19:02

k8s-ci-robot added the cncf-cla: no label Oct 3, 2024

k8s-ci-robot added size/XXL cncf-cla: yes and removed cncf-cla: no labels Oct 3, 2024

kaushikmitr added 2 commits October 5, 2024 01:56

add python lib for simulation

cf7d0b3

add python lib for simulation

Loading
Loading status checks…

d793532

kaushikmitr added 4 commits October 5, 2024 05:04

update constants

Loading
Loading status checks…

1fdb7b5

fix recompute bug

1e9e3cc

fix recompute bug

Loading
Loading status checks…

9bc831c

move to src folder, do weighted dequeing

Loading
Loading status checks…

33f20a9

terrytangyuan reviewed Oct 10, 2024

View reviewed changes

kaushikmitr added 3 commits October 23, 2024 18:11

update dequeuing logic

Loading
Loading status checks…

8e94a9c

remove extra lines

Loading
Loading status checks…

b9d12e3

Merge branch 'kubernetes-sigs:main' into main

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

58d2c77

Merge branch 'kubernetes-sigs:main' into main

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

5d48994

k8s-ci-robot assigned ahg-g Nov 16, 2024

k8s-ci-robot added the lgtm label Nov 16, 2024

k8s-ci-robot added the approved label Nov 16, 2024

k8s-ci-robot merged commit 300176b into kubernetes-sigs:main Nov 16, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simulation code for llm inference gateway #15

Simulation code for llm inference gateway #15

kaushikmitr commented Oct 3, 2024

linux-foundation-easycla bot commented Oct 3, 2024 •

edited

Loading

k8s-ci-robot commented Oct 3, 2024

ahg-g commented Oct 4, 2024

kaushikmitr commented Oct 4, 2024

kaushikmitr commented Oct 5, 2024

terrytangyuan Oct 10, 2024

kaushikmitr Oct 23, 2024

ahg-g commented Oct 28, 2024

ahg-g commented Nov 16, 2024

k8s-ci-robot commented Nov 16, 2024

Simulation code for llm inference gateway #15

Simulation code for llm inference gateway #15

Conversation

kaushikmitr commented Oct 3, 2024

The notebook implements the following components:

linux-foundation-easycla bot commented Oct 3, 2024 • edited Loading

k8s-ci-robot commented Oct 3, 2024

ahg-g commented Oct 4, 2024

kaushikmitr commented Oct 4, 2024

kaushikmitr commented Oct 5, 2024

terrytangyuan Oct 10, 2024

Choose a reason for hiding this comment

kaushikmitr Oct 23, 2024

Choose a reason for hiding this comment

ahg-g commented Oct 28, 2024

ahg-g commented Nov 16, 2024

k8s-ci-robot commented Nov 16, 2024

linux-foundation-easycla bot commented Oct 3, 2024 •

edited

Loading