[Bugfix][Model] fix mllama multi-image #14883

yma11 · 2025-03-16T08:34:33Z

github-actions · 2025-03-16T08:34:41Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

heheda12345 · 2025-03-17T06:34:31Z

vllm/model_executor/models/mllama.py

+                    img = pixel_values_unpacked[b][i]
+                    out_images[b, i, :img.shape[0]] = img
+            return out_images
+
    def _parse_and_validate_image_input(self, **kwargs: object):
        # tensor with the same shape will be batched together by
        # MultiModalKwargs.batch, so pixel_values here can be:


@DarkLight1337 @ywang96 I think it is a common problem that images have different sizes and we need to pad them from list of tensors with different shape to one tensor. (See these code comments for details). Is there any utility functions for this in vLLM?

This is usually done as part of the HF processor. If the HF processor doesn't do this, you can apply it manually like in Pixtral-HF.

Thanks! Given that, I think it is OK to implement the unpacking in mllama.py.

heheda12345

Thanks for the fix! I left some comments. And do you know why test_models_single_leading_image can pass without unpacking the data given that it contains a test with different number of tiles?

        # Multi-size, batched, including text only
        [(512, 512), (1024, 512), (1536, 512), (2048, 512), (512, 1024),
         (1024, 1024), (512, 1536), (512, 2028), None],)

heheda12345 · 2025-03-17T07:12:08Z

vllm/model_executor/models/mllama.py

+                    img = pixel_values_unpacked[b][i]
+                    out_images[b, i, :img.shape[0]] = img
+            return out_images
+
    def _parse_and_validate_image_input(self, **kwargs: object):
        # tensor with the same shape will be batched together by
        # MultiModalKwargs.batch, so pixel_values here can be:


Thanks! Given that, I think it is OK to implement the unpacking in mllama.py.

heheda12345 · 2025-03-17T07:20:37Z

vllm/model_executor/models/mllama.py

+        elif isinstance(image_data[0], torch.Tensor):
+            bsz = len(image_data)
+            # List[torch.Tensor]
+            if image_data[0].ndim == 1:


Some questions:

Can you merge the code path of image_data[0].ndim == 1 and image_data[0].ndim == 2? I think it can be achieved by something like:

def unpack_data(tensor_list, padding_value=0): batch_size = len(tensor_list) max_length = max(t.size(0) for t in tensor_list) trailing_dims = tensor_list[0].shape[1:] padded_tensor = torch.full((batch_size, max_length, *trailing_dims), padding_value, dtype=tensor_list[0].dtype, device=tensor_list[0].device) for i, t in enumerate(tensor_list): padded_tensor[i, :t.size(0)] = t return padded_tensor

image_data is with type Union[List[List[torch.Tensor]], List[torch.Tensor], torch.Tensor]. But seems that only List[torch.Tensor] and torch.Tensor are handled.

Can you add the return type annotation to the function signature?

yma11 · 2025-03-21T06:42:55Z

Thanks for the fix! I left some comments. And do you know why test_models_single_leading_image can pass without unpacking the data given that it contains a test with different number of tiles?
        # Multi-size, batched, including text only
        [(512, 512), (1024, 512), (1536, 512), (2048, 512), (512, 1024),
         (1024, 1024), (512, 1536), (512, 2028), None],)

Seems the data is already in a tensor shape, like:

aspect_ratio_ids: tensor([[6, 6],
        [1, 6]], device='cuda:0')
aspect_ratio_mask: tensor([[[1, 1, 1, 1],
         [1, 1, 1, 1]],

        [[1, 0, 0, 0],
         [1, 1, 1, 1]]], device='cuda:0')

aspect_ratio_ids: tensor([[6, 7, 5]], device='cuda:0')
aspect_ratio_mask: tensor([[[1, 1, 1, 1],
         [1, 1, 1, 0],
         [1, 1, 0, 0]]], device='cuda:0')

So maybe we can assume there will be no type like List[List[torch.Tensor]] for these datas? At least, I never triggered this path so have no idea what kind of padding should be correct.

heheda12345 · 2025-03-22T15:56:07Z

https://github.com/huggingface/transformers/blob/c9d1e5238a752813ba91a8751a638a09b5efbb73/src/transformers/models/mllama/image_processing_mllama.py#L767-L770
I just notice that num_tiles is always padded to 4 regardless of the real max_num_tiles of images inside the request (see the above code). Therefore, the code path for List[List[torch.Tensor]] should never be triggered.

@yma11 Can you help to do the following things?

Simplify the code to only handle torch.Tensor & List[torch.Tensor] and add assert for it is not List[List[torch.Tensor]]
merge unpack_data and unpack_pixel_values to one function, which should be possible now because we no longer need to do padding over the num_tiles dimension.
for List[torch.Tensor] cases, verify that the trailing_dims are the same for all Tensors.

yma11 · 2025-03-24T11:17:18Z

https://github.com/huggingface/transformers/blob/c9d1e5238a752813ba91a8751a638a09b5efbb73/src/transformers/models/mllama/image_processing_mllama.py#L767-L770 I just notice that num_tiles is always padded to 4 regardless of the real max_num_tiles of images inside the request (see the above code). Therefore, the code path for List[List[torch.Tensor]] should never be triggered.

@yma11 Can you help to do the following things?

Simplify the code to only handle torch.Tensor & List[torch.Tensor] and add assert for it is not List[List[torch.Tensor]]

merge unpack_data and unpack_pixel_values to one function, which should be possible now because we no longer need to do padding over the num_tiles dimension.

for List[torch.Tensor] cases, verify that the trailing_dims are the same for all Tensors.

Updated.

heheda12345 · 2025-03-24T12:57:17Z

The code is quite clean now! Can you fix the unit tests in tests/models/encoder_decoder/vision_language/test_mllama.py?

yma11 · 2025-03-25T01:43:21Z

The code is quite clean now! Can you fix the unit tests in tests/models/encoder_decoder/vision_language/test_mllama.py?

do you mean unit test failure? I didn't observe it, any link?. Or you mean revert changes in the UT?

heheda12345 · 2025-03-25T02:34:36Z

The unit tests in this file fails on my local environment.

yma11 · 2025-03-26T08:37:01Z

The unit tests in this file fails on my local environment.

Oh see, I can reproduce this issue in my local env but really have no clue about this. with export CUDA_LAUNCH_BLOCKING=1, I got log like following:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <vllm.worker.enc_dec_model_runner.EncoderDecoderModelRunner object at 0x7f19f88f04c0>
model_input = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n') raised in repr()] EncoderDecoderModelInput object at 0x7f19cd9ef220>
kv_caches = [<[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with `TORCH_USE_CUDA_DSA` to enable dev...ith `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f18abffb180>, ...]
intermediate_tensors = None, num_steps = 1

    @torch.inference_mode()
    def execute_model(
        self,
        model_input: EncoderDecoderModelInput,
        kv_caches: List[torch.Tensor],
        intermediate_tensors: Optional[IntermediateTensors] = None,
        num_steps: int = 1,
    ) -> Optional[List[PoolerOutput]]:
        if num_steps > 1:
            raise ValueError("num_steps > 1 is not supported in "
                             "EncoderDecoderModelRunner")

        if (model_input.attn_metadata is not None
                and model_input.attn_metadata.prefill_metadata is None
                and model_input.attn_metadata.decode_metadata.use_cuda_graph):
            assert model_input.input_tokens is not None
            graph_batch_size = model_input.input_tokens.shape[0]
            model_executable = self.graph_runners[
                model_input.virtual_engine][graph_batch_size]
        else:
            model_executable = self.model

        seqlen_agnostic_kwargs = {
            "finished_requests_ids": model_input.finished_requests_ids,
            "request_ids_to_seq_ids": model_input.request_ids_to_seq_ids,
        } if self.has_inner_state else {}

        multi_modal_kwargs = model_input.multi_modal_kwargs or {}
        with set_forward_context(model_input.attn_metadata, self.vllm_config,
                                 model_input.virtual_engine):
>           hidden_or_intermediate_states = model_executable(
                input_ids=model_input.input_tokens,
                positions=model_input.input_positions,
                encoder_input_ids=model_input.encoder_input_tokens,
                encoder_positions=model_input.encoder_input_positions,
                intermediate_tensors=intermediate_tensors,
                **MultiModalKwargs.as_kwargs(multi_modal_kwargs,
                                             device=self.device),
                **seqlen_agnostic_kwargs)

Do you have any insights?

heheda12345 · 2025-03-26T14:05:39Z

I think you can run with enforce_eager=True to disable cuda graph. Then, export CUDA_LAUNCH_BLOCKING=1 should help you to find the line that crashes.

yma11 · 2025-03-29T08:14:04Z

I think you can run with enforce_eager=True to disable cuda graph. Then, export CUDA_LAUNCH_BLOCKING=1 should help you to find the line that crashes.

@heheda12345 With such settings, it hints the error happens at operation on q and passing a continuous Tensor like this will work as verified in my env. But curious why memory of these tensors become inaccessible after passing to method _attention_with_mask? Please help merge this PR if you think the change is okay.

heheda12345 · 2025-03-29T16:24:50Z

Thanks for your information. After some debugging, I find that the crash comes from torch.ops._C_cache_ops.reshape_and_cache_flash and PagedAttention.write_to_paged_cache. The reason is that kv_range_for_decode and attn_metadata do not match. This bug has been fixed by #15564. Can you wait unit that PR is merged into main branch and rebase yours? After manually merging this PR, I can pass all test without the added .contiguous() in my local environment.

yma11 · 2025-03-31T01:48:08Z

verified and rebased.

heheda12345

LGTM! All tests are passed in my local environment and the code is very clean now.

heheda12345 · 2025-03-31T02:46:25Z

@yma11 Can you fix the DCO failure?

yma11 · 2025-03-31T03:45:25Z

@yma11 Can you fix the DCO failure?

@heheda12345 Done. But seems some OOM in V1 test not related with this PR.

Signed-off-by: yan ma <[email protected]>

This reverts commit 8f9a1ce. Signed-off-by: yan ma <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: yan ma <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

Signed-off-by: yan ma <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

Signed-off-by: yan ma <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

yma11 requested review from DarkLight1337 and ywang96 as code owners March 16, 2025 08:34

DarkLight1337 requested a review from heheda12345 March 16, 2025 08:58

heheda12345 reviewed Mar 17, 2025

View reviewed changes

heheda12345 requested changes Mar 17, 2025

View reviewed changes

heheda12345 mentioned this pull request Mar 20, 2025

Enable CUDA graph support for llama 3.2 vision #14917

Merged

yma11 force-pushed the mllama-fix branch from f18d1f3 to 12dc74c Compare March 21, 2025 06:38

yma11 force-pushed the mllama-fix branch from 12dc74c to d3283d1 Compare March 21, 2025 07:54

yma11 force-pushed the mllama-fix branch from d3283d1 to ae2fe68 Compare March 24, 2025 11:15

heheda12345 mentioned this pull request Mar 28, 2025

[Bugfix] Fix Mllama interleaved images input support #15564

Merged

yma11 force-pushed the mllama-fix branch from ae2fe68 to 494d445 Compare March 29, 2025 08:06

yma11 force-pushed the mllama-fix branch from 494d445 to 11ba262 Compare March 31, 2025 01:47

heheda12345 approved these changes Mar 31, 2025

View reviewed changes

yma11 force-pushed the mllama-fix branch from 11ba262 to fd87971 Compare March 31, 2025 03:44

heheda12345 enabled auto-merge (squash) March 31, 2025 05:33

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 31, 2025

yma11 added 4 commits March 31, 2025 16:29

fix mllama multi-image

0c7b943

Signed-off-by: yan ma <[email protected]>

refine unpack

ac06066

Signed-off-by: yan ma <[email protected]>

fix UT

8f9a1ce

Signed-off-by: yan ma <[email protected]>

Revert "fix UT"

fd87971

This reverts commit 8f9a1ce. Signed-off-by: yan ma <[email protected]>

vllm-bot merged commit ff64739 into vllm-project:main Apr 1, 2025
41 of 43 checks passed

heheda12345 mentioned this pull request Apr 1, 2025

[Bugfix] handle alignment of encoder_seq_lens in mllama.py #14784

Merged

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Apr 2, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

c6bc821

Signed-off-by: yan ma <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

cc884d0

Signed-off-by: yan ma <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

0f62fd2

Signed-off-by: yan ma <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

7905641

Signed-off-by: yan ma <[email protected]>

zhouyu5 pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 10, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

224f1a2

Signed-off-by: yan ma <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 29, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

9c7d8f6

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

d2db54e

Signed-off-by: yan ma <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 30, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

ad77c5f

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

655f875

Signed-off-by: yan ma <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 7, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

607f2b6

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 7, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

e58944e

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 9, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

06166cb

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

632aa6d

Signed-off-by: yan ma <[email protected]> Signed-off-by: Mu Huai <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 12, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

9235871

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 13, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

c87659d

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 13, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

09097b1

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request May 14, 2025

[Bugfix][Model] fix mllama multi-image (vllm-project#14883)

c98aa16

Signed-off-by: yan ma <[email protected]> Signed-off-by: Bill Nell <[email protected]>

yma11 deleted the mllama-fix branch May 27, 2025 02:06

Uh oh!

[Bugfix][Model] fix mllama multi-image #14883

[Bugfix][Model] fix mllama multi-image #14883

Uh oh!

Conversation

yma11 commented Mar 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 16, 2025

Uh oh!

heheda12345 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

heheda12345 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

yma11 commented Mar 21, 2025

Uh oh!

heheda12345 commented Mar 22, 2025

Uh oh!

yma11 commented Mar 24, 2025

Uh oh!

heheda12345 commented Mar 24, 2025

Uh oh!

yma11 commented Mar 25, 2025

Uh oh!

heheda12345 commented Mar 25, 2025

Uh oh!

yma11 commented Mar 26, 2025

Uh oh!

heheda12345 commented Mar 26, 2025

Uh oh!

yma11 commented Mar 29, 2025

Uh oh!

heheda12345 commented Mar 29, 2025

Uh oh!

yma11 commented Mar 31, 2025

Uh oh!

heheda12345 left a comment

Choose a reason for hiding this comment

Uh oh!

heheda12345 commented Mar 31, 2025

Uh oh!

yma11 commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yma11 commented Mar 16, 2025 •

edited by github-actions bot

Loading

yma11 commented Mar 31, 2025 •

edited

Loading