-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[V1][Performance] Implement custom serializaton for MultiModalKwargs #16279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update the custom msgpack encoding/decoding to work with lists of buffers so that the backing data of tensors/numpy arrays contained in messages is sent directly by zmq without copying. Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
# Conflicts: # vllm/v1/engine/core_client.py
…ocopy Signed-off-by: Nick Hill <[email protected]>
…ocopy Signed-off-by: Nick Hill <[email protected]> # Conflicts: # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py # vllm/v1/serial_utils.py
WIP, just handles the basic case of simple str->Tensor dict.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Thanks @p88h, this is pretty much what I had planned! It would be great if you could take this on though. We are planning to disable the use of pickle by default, so it would be good to do this in a way at avoid that. For handling tensors/ndarrays in general, I have a PR #13790 which also eliminates some mem copies, so it would be good to base on top of that (I'll try to get it merged asap, just need to add a unit test). Unfortunately msgspec doesn't support custom nested types, but I thought to have intermediate types like: @dataclass
class FlatNestedTensors:
tensors: list[torch.Tensor]
structure: list[Union[int, list]] where |
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
I added a comment on your PR, I think it can be simplified to sth like this: ... if it worked. That inner msgspec serialization is not working as expected. |
Signed-off-by: Nick Hill <[email protected]>
So when encoding msgspec is actually smart enough to just handle everything recursively. Well, except...
For now, I implemented something that works, unfortunately, the It also adds additional complexity to serializing MultiModalKwargs, too - since the old and new layout overlap (share tensors), we likely don't want to send that over twice, even with zero copy that seems unnecesary. The approach is now to use either items by modality (and reconstruct UserDict) OR just pass through the dict for V0-style usage (which I think is still in use even in V1) |
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chenyaaang <[email protected]>
…llm-project#15423) Signed-off-by: Chih-Chieh-Yang <[email protected]> Co-authored-by: Yu Chin Fabian Lim <[email protected]>
…-project#16416) Signed-off-by: DarkLight1337 <[email protected]>
@njhill @DarkLight1337 PTAL #16432 |
FIX #16185
WIP, just handles the basic case of simple str->Tensor dict.