[Model] Add `SupportsMultiModal.get_language_model` interface #16007

NickLucche · 2025-04-03T10:32:47Z

Most vlms adhere to HF unwritten standard and use self.language_model, but the naming is not enforced.
This PR adds a getter to abstract that naming. See discussion in #15782 (comment) for more context.

I think Whisper is the only outlier in this taxonomy.

github-actions · 2025-04-03T10:32:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

hmellor · 2025-04-03T10:44:43Z

This seems like the kind of thing that we could fix on the HF side. Similar to the getter and setter functions we have for input and output embeddings

jeejeelee · 2025-04-03T10:48:04Z

How about getting all modalities modules using get_mm_mapping?

NickLucche · 2025-04-03T12:47:21Z

This seems like the kind of thing that we could fix on the HF side

Isn't this entirely dependent on how the model is implemented in vllm?

How about getting all modalities modules using get_mm_mapping?

Unfortunately it's not implemented for every model. But if it were implemented, that would work. I was mostly taking @DarkLight1337 advice.

hmellor · 2025-04-03T12:54:25Z

Isn't this entirely dependent on how the model is implemented in vllm?

You're right, since we reimplement the modelling code in vLLM, having nice utilities in Transformers modelling code doesnt help us here.

However, I do see a future where the Transformers backend is stable and performant enough that much of the modelling code in vLLM will not be needed anymore 🤞

NickLucche · 2025-04-03T13:03:57Z

Yeah I'd be very happy with that future, modelling would be easier.
But I think we may still have cases where research teams for whatever reason might not go through hf.transformers implementation first (Qwen maybe?) and contribute their model here.
For those cases I think it's not bad if the interface grows a tiny bit tighter.

hmellor · 2025-04-03T13:14:59Z

We do hope to make model contributions to Transformers easier in future, but yes there may still be some models which need to be modelled in vLLM, which is fine.

DarkLight1337 · 2025-04-03T14:56:55Z

To avoid merge conflicts, let's wait until ~~#15712~~ #16076 is merged first

DarkLight1337 · 2025-04-08T02:50:45Z

The other PR has been merged, can you update this one? Also there have been a couple new multi-modal models so you should update them as well.

Signed-off-by: NickLucche <[email protected]>

vllm/model_executor/models/chameleon.py

vllm/model_executor/models/glm4v.py

vllm/model_executor/models/idefics3.py

vllm/model_executor/models/interfaces.py

NickLucche · 2025-04-08T08:06:09Z

Rebased and added llama4. I am counting 30 architectures, does that check out?

vllm/model_executor/models/qwen_vl.py

DarkLight1337 · 2025-04-08T08:09:39Z

You're also missing mllama (llama 3.2 multimodal) and phi4mm (Phi-4-multimodal)

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-04-08T08:19:27Z

Thanks for looking into it!

Signed-off-by: NickLucche <[email protected]>

Signed-off-by: DarkLight1337 <[email protected]>

…roject#16007) Signed-off-by: NickLucche <[email protected]> Signed-off-by: zRzRzRzRzRzRzR <[email protected]>

NickLucche requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners April 3, 2025 10:32

mergify bot added v1 tpu Related to Google TPUs labels Apr 3, 2025

NickLucche mentioned this pull request Apr 3, 2025

[torch.compile][TPU] Make @support_torch_compile work for XLA backend #15782

Merged

NickLucche added 3 commits April 8, 2025 07:48

get_language_model

f6c41c3

Signed-off-by: NickLucche <[email protected]>

remove leftovers

41d5cfe

Signed-off-by: NickLucche <[email protected]>

llama4

f8eea45

Signed-off-by: NickLucche <[email protected]>

NickLucche force-pushed the mm-get-language-model branch from c3bc449 to f8eea45 Compare April 8, 2025 08:03

mergify bot removed the tpu Related to Google TPUs label Apr 8, 2025