Use `RMSNorm` in `TransformersModel` #12776

hmellor · 2025-02-05T11:06:58Z

Changes:

Improve performance by using vLLM's RMSNorm class in TransformersModel
Improve user experience by warning instead of raising when Linear layer cannot be tensor parallelised

Before and after benchmarks using the following command:

python benchmarks/benchmark_throughput.py \
  --backend vllm \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --dataset ShareGPT_V3_unfiltered_cleaned_split.json \
  --model-impl transformers # forces TransformersModel

Results:

	Result
`LlamaForCausalLM`	13.07 requests/s, 5403.27 total tokens/s, 2591.54 output tokens/s
`TransformersModel` before	11.78 requests/s, 4872.95 total tokens/s, 2337.18 output tokens/s
`TransformersModel` after	12.11 requests/s, 5009.63 total tokens/s, 2402.73 output tokens/s

This corresponds to a +2.8% performance boost for this model.

Signed-off-by: Harry Mellor <[email protected]>

github-actions · 2025-02-05T11:07:13Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Harry Mellor <[email protected]>

vllm/model_executor/models/transformers.py

Signed-off-by: Harry Mellor <[email protected]>

vllm/model_executor/models/transformers.py

youkaichao

if you can make the model compatible with torch.compile, then you should be able to directly get the benefit.

hmellor · 2025-02-05T12:45:28Z

Do you mean TransformersModel here? Or the model code in Transformers?

mgoin

IMO this is more complexity than it is worth. It should be in our best interest to keep TransformersModel as minimal as possible while maintaining functionality.

I agree with Kaichao the better approach for performance will be to make TransformersModel generally work with torch.compile, which should have essentially the same impact as using the fused RMSNorm module from vLLM

hmellor · 2025-02-05T17:34:49Z

Ok, TransformersModel is already compatible with V1 (which I am told already uses torch.compile via -O3 by default) and I agree it'd be nicer to not have to do this.

With V1 enabled the benchmarks look like:

	Result
`LlamaForCausalLM`	18.44 requests/s, 7623.60 total tokens/s, 3656.46 output tokens/s
`TransformersModel` before	14.85 requests/s, 6141.45 total tokens/s, 2945.58 output tokens/s
`TransformersModel` after	15.23 requests/s, 6299.61 total tokens/s, 3021.44 output tokens/s

So using vLLM's RMSNorm still gives us some benefit 🤔

See Kaichao and Michael's comments

hmellor · 2025-02-05T18:32:20Z

I've made #12785 to handle the UX portion of this PR (which shouldn't be controvertial I don't think)

mergify · 2025-02-06T09:05:26Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hmellor.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2025-02-06T09:18:13Z

With the UX change merged, I'll close this in favour of keeping TransformersModel as clean as possible.

hmellor added 2 commits February 5, 2025 11:38

Warn instead of raise when linear layer isn't parallelisable

14ef1ec

Signed-off-by: Harry Mellor <[email protected]>

Replace Transformers RMS Norm with vLLM RMSNorm

3336897

Signed-off-by: Harry Mellor <[email protected]>

Soft fail when RMS norm cannot be replaced, as for linear

d9e3046

Signed-off-by: Harry Mellor <[email protected]>

hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 5, 2025

Isotr0py reviewed Feb 5, 2025

View reviewed changes

vllm/model_executor/models/transformers.py Outdated Show resolved Hide resolved

vllm/model_executor/models/transformers.py Outdated Show resolved Hide resolved

hmellor added 2 commits February 5, 2025 13:02

Remove unnecessary loop

43792d7

Signed-off-by: Harry Mellor <[email protected]>

Support replacement of nn.RMSNorm

4c1c252

Signed-off-by: Harry Mellor <[email protected]>

ArthurZucker reviewed Feb 5, 2025

View reviewed changes

vllm/model_executor/models/transformers.py Show resolved Hide resolved

youkaichao reviewed Feb 5, 2025

View reviewed changes

mgoin reviewed Feb 5, 2025

View reviewed changes

simon-mo previously approved these changes Feb 5, 2025

View reviewed changes

hmellor mentioned this pull request Feb 5, 2025

Improve TransformersModel UX #12785

Merged

mergify bot added the needs-rebase label Feb 6, 2025

hmellor closed this Feb 6, 2025

hmellor deleted the rms-norm branch February 6, 2025 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use `RMSNorm` in `TransformersModel` #12776

Use `RMSNorm` in `TransformersModel` #12776

Uh oh!

hmellor commented Feb 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

mgoin left a comment

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

mergify bot commented Feb 6, 2025

Uh oh!

hmellor commented Feb 6, 2025

Uh oh!

Uh oh!

Uh oh!

Use RMSNorm in TransformersModel #12776

Use RMSNorm in TransformersModel #12776

Uh oh!

Conversation

hmellor commented Feb 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

hmellor commented Feb 5, 2025

Uh oh!

mergify bot commented Feb 6, 2025

Uh oh!

hmellor commented Feb 6, 2025

Uh oh!

Uh oh!

Use `RMSNorm` in `TransformersModel` #12776

Use `RMSNorm` in `TransformersModel` #12776

hmellor commented Feb 5, 2025 •

edited by github-actions bot

Loading