Skip to content

[V1][Performance] Implement custom serializaton for MultiModalKwargs #16279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 43 commits into from

Conversation

p88h
Copy link
Contributor

@p88h p88h commented Apr 8, 2025

FIX #16185

WIP, just handles the basic case of simple str->Tensor dict.

njhill and others added 7 commits February 24, 2025 15:28
Update the custom msgpack encoding/decoding to work with lists of buffers so that the backing data of tensors/numpy arrays contained in messages is sent directly by zmq without copying.

Signed-off-by: Nick Hill <[email protected]>
# Conflicts:
#	vllm/v1/engine/core_client.py
…ocopy

Signed-off-by: Nick Hill <[email protected]>

# Conflicts:
#	vllm/v1/engine/core.py
#	vllm/v1/engine/core_client.py
#	vllm/v1/serial_utils.py
WIP, just handles the basic case of simple str->Tensor dict.
Copy link

github-actions bot commented Apr 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@njhill
Copy link
Member

njhill commented Apr 8, 2025

Thanks @p88h, this is pretty much what I had planned! It would be great if you could take this on though.

We are planning to disable the use of pickle by default, so it would be good to do this in a way at avoid that. For handling tensors/ndarrays in general, I have a PR #13790 which also eliminates some mem copies, so it would be good to base on top of that (I'll try to get it merged asap, just need to add a unit test).

Unfortunately msgspec doesn't support custom nested types, but I thought to have intermediate types like:

@dataclass
class FlatNestedTensors:
    tensors: list[torch.Tensor]
    structure: list[Union[int, list]]

where structure is a nested list of indexes into the tensors list. And then the tensors should automatically be handled by the zero-copy logic.

@njhill
Copy link
Member

njhill commented Apr 8, 2025

@p88h I've added a test to #13790 so we can hopefully merge it shortly.

Signed-off-by: Nick Hill <[email protected]>
@p88h
Copy link
Contributor Author

p88h commented Apr 8, 2025

I added a comment on your PR, I think it can be simplified to sth like this:
https://gist.github.com/p88h/1daec6374c35293f6bced9333d6f2c4c

... if it worked. That inner msgspec serialization is not working as expected.

@p88h
Copy link
Contributor Author

p88h commented Apr 9, 2025

So when encoding msgspec is actually smart enough to just handle everything recursively.
Unfortunately when decoding not everything is automatic, so some fix-up is necessary - but a lot of simpler now.

Well, except...

_items_by_modality doesn't want to be easily serialized.

For now, I implemented something that works, unfortunately, the MultiModalField part requires a pickle - msgpack doesn't even notice it's a custom type, just encodes the whole MultiModalFieldElem as a dict and just ignores that part.

It also adds additional complexity to serializing MultiModalKwargs, too - since the old and new layout overlap (share tensors), we likely don't want to send that over twice, even with zero copy that seems unnecesary. The approach is now to use either items by modality (and reconstruct UserDict) OR just pass through the dict for V0-style usage (which I think is still in use even in V1)

@njhill
Copy link
Member

njhill commented Apr 10, 2025

Thanks @p88h I will review again once #13790 is merged (imminently) and this is rebased on top of that.

Copy link

mergify bot commented Apr 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @p88h.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 10, 2025
@p88h p88h requested a review from DarkLight1337 as a code owner April 10, 2025 20:36
@mergify mergify bot added the multi-modality Related to multi-modality (#4194) label Apr 10, 2025
@mergify mergify bot removed the needs-rebase label Apr 10, 2025
@p88h p88h closed this Apr 10, 2025
@p88h p88h deleted the serialize-multimodal-kwargs branch April 10, 2025 20:43
@p88h
Copy link
Contributor Author

p88h commented Apr 10, 2025

@njhill @DarkLight1337 PTAL #16432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multi-modality Related to multi-modality (#4194) v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Huge memory overhead with V1 (multiprocessing) when handling several multimodal inputs
5 participants