-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Bugfix][Model] fix mllama multi-image #14883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
img = pixel_values_unpacked[b][i] | ||
out_images[b, i, :img.shape[0]] = img | ||
return out_images | ||
|
||
def _parse_and_validate_image_input(self, **kwargs: object): | ||
# tensor with the same shape will be batched together by | ||
# MultiModalKwargs.batch, so pixel_values here can be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 @ywang96 I think it is a common problem that images have different sizes and we need to pad them from list of tensors with different shape to one tensor. (See these code comments for details). Is there any utility functions for this in vLLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is usually done as part of the HF processor. If the HF processor doesn't do this, you can apply it manually like in Pixtral-HF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Given that, I think it is OK to implement the unpacking in mllama.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! I left some comments. And do you know why test_models_single_leading_image
can pass without unpacking the data given that it contains a test with different number of tiles?
# Multi-size, batched, including text only
[(512, 512), (1024, 512), (1536, 512), (2048, 512), (512, 1024),
(1024, 1024), (512, 1536), (512, 2028), None],)
img = pixel_values_unpacked[b][i] | ||
out_images[b, i, :img.shape[0]] = img | ||
return out_images | ||
|
||
def _parse_and_validate_image_input(self, **kwargs: object): | ||
# tensor with the same shape will be batched together by | ||
# MultiModalKwargs.batch, so pixel_values here can be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Given that, I think it is OK to implement the unpacking in mllama.py.
vllm/model_executor/models/mllama.py
Outdated
elif isinstance(image_data[0], torch.Tensor): | ||
bsz = len(image_data) | ||
# List[torch.Tensor] | ||
if image_data[0].ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions:
- Can you merge the code path of
image_data[0].ndim == 1
andimage_data[0].ndim == 2
? I think it can be achieved by something like:
def unpack_data(tensor_list, padding_value=0):
batch_size = len(tensor_list)
max_length = max(t.size(0) for t in tensor_list)
trailing_dims = tensor_list[0].shape[1:]
padded_tensor = torch.full((batch_size, max_length, *trailing_dims),
padding_value,
dtype=tensor_list[0].dtype,
device=tensor_list[0].device)
for i, t in enumerate(tensor_list):
padded_tensor[i, :t.size(0)] = t
return padded_tensor
- image_data is with type
Union[List[List[torch.Tensor]], List[torch.Tensor], torch.Tensor]
. But seems that only List[torch.Tensor] and torch.Tensor are handled. - Can you add the return type annotation to the function signature?
Seems the data is already in a tensor shape, like:
So maybe we can assume there will be no type like |
https://github.com/huggingface/transformers/blob/c9d1e5238a752813ba91a8751a638a09b5efbb73/src/transformers/models/mllama/image_processing_mllama.py#L767-L770 @yma11 Can you help to do the following things?
|
Updated. |
The code is quite clean now! Can you fix the unit tests in |
Oh see, I can reproduce this issue in my local env but really have no clue about this. with
Do you have any insights? |
I think you can run with |
@heheda12345 With such settings, it hints the error happens at operation on q and passing a continuous Tensor like this will work as verified in my env. But curious why memory of these tensors become inaccessible after passing to method |
Thanks for your information. After some debugging, I find that the crash comes from |
verified and rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! All tests are passed in my local environment and the code is very clean now.
@yma11 Can you fix the DCO failure? |
@heheda12345 Done. But seems some OOM in V1 test not related with this PR. |
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
This reverts commit 8f9a1ce. Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: xinyuxiao <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>
FIX #14551