[V1] Scatter and gather placeholders in the model runner #15712

DarkLight1337 · 2025-03-28T15:55:41Z

This PR is an attempt to move scatter_patch_features and gather_patch_feature into the model runner (outside of the model) to avoid interfering with TPU graph compilation.

Breaking change for model developers:

PromptUpdateDetails.features has been replaced with PromptUpdateDetails.is_embed. You can use the newly added factories PromptUpdateDetails.select_text and PromptUpdateDetails.select_token_id to generate is_embed based on the target text/token ID.
BaseProcessingInfo.get_num_image_tokens should now return the equivalent of PromptUpdateDetails.is_embed.sum() instead of the number of tokens in PromptUpdateDetails.features.

github-actions · 2025-03-28T15:55:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill

Thanks as always for the great work @DarkLight1337! I'm not an expert in most of what's changed here but did notice one thing.

vllm/v1/worker/gpu_model_runner.py

mgoin

This looks great and even removes a lot of complex code. It should fix the immediate issue we have with llava on TPU since we get to skip the logic now in the non-Pixtral case. Even if we are stuck with dynamic creation, we can isolate it in a smaller graph with this refactor.

Does this still have many issues to resolve or do you think it could be tractable this week?

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-03-31T10:26:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @DarkLight1337.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-03-31T11:09:52Z

~~I am working on making sure all of our existing multi-modal models on V1 return a sequence of 2D embeddings. Turns out quite a few models don't follow this...~~ Done in #15816

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-03-31T17:17:25Z

Signed-off-by: DarkLight1337 <[email protected]>

mgoin · 2025-04-01T11:52:09Z

Fuyu fails as it still has from .vision import scatter_patch_features, select_patch_features at the top of its model definition. Tested with pytest tests/models/decoder_only/vision_language/test_models.py -k "[fuyu-"
Llava works fine, passes pytest tests/models/decoder_only/vision_language/test_models.py -k "[llava-"
NVLM-D failed using the example script python examples/offline_inference/vision_language.py -m NVLM_D

ERROR 04-01 12:12:46 [core.py:377] Traceback (most recent call last):
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
ERROR 04-01 12:12:46 [core.py:377]     output = func(*args, **kwargs)
ERROR 04-01 12:12:46 [core.py:377]              ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-01 12:12:46 [core.py:377]     return func(*args, **kwargs)
ERROR 04-01 12:12:46 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
ERROR 04-01 12:12:46 [core.py:377]     self.model_runner.profile_run()
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_model_runner.py", line 1504, in profile_run
ERROR 04-01 12:12:46 [core.py:377]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
ERROR 04-01 12:12:46 [core.py:377]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
ERROR 04-01 12:12:46 [core.py:377]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
ERROR 04-01 12:12:46 [core.py:377]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
ERROR 04-01 12:12:46 [core.py:377]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
ERROR 04-01 12:12:46 [core.py:377]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
ERROR 04-01 12:12:46 [core.py:377]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
ERROR 04-01 12:12:46 [core.py:377]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 154, in _get_dummy_mm_inputs
ERROR 04-01 12:12:46 [core.py:377]     return self.processor.apply(
ERROR 04-01 12:12:46 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/processing.py", line 1639, in apply
ERROR 04-01 12:12:46 [core.py:377]     self._validate_mm_placeholders(mm_placeholders, mm_item_counts)
ERROR 04-01 12:12:46 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/processing.py", line 1562, in _validate_mm_placeholders
ERROR 04-01 12:12:46 [core.py:377]     raise RuntimeError(
ERROR 04-01 12:12:46 [core.py:377] RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs, or there is a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_prompt_updates`).

Pixtral/Mistral Small runs into an error with the mistral-small.py example script

python examples/offline_inference/mistral-small.py simple

INFO 04-01 11:29:43 [gpu_model_runner.py:1498] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 2 image items of the maximum feature size.
ERROR 04-01 11:29:43 [core.py:377] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 365, in run_engine_core
ERROR 04-01 11:29:43 [core.py:377]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-01 11:29:43 [core.py:377]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 306, in __init__
ERROR 04-01 11:29:43 [core.py:377]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 69, in __init__
ERROR 04-01 11:29:43 [core.py:377]     num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 04-01 11:29:43 [core.py:377]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 130, in _initialize_kv_caches
ERROR 04-01 11:29:43 [core.py:377]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 04-01 11:29:43 [core.py:377]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 04-01 11:29:43 [core.py:377]     output = self.collective_rpc("determine_available_memory")
ERROR 04-01 11:29:43 [core.py:377]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-01 11:29:43 [core.py:377]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-01 11:29:43 [core.py:377]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/utils.py", line 2329, in run_method
ERROR 04-01 11:29:43 [core.py:377]     return func(*args, **kwargs)
ERROR 04-01 11:29:43 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-01 11:29:43 [core.py:377]     return func(*args, **kwargs)
ERROR 04-01 11:29:43 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
ERROR 04-01 11:29:43 [core.py:377]     self.model_runner.profile_run()
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_model_runner.py", line 1504, in profile_run
ERROR 04-01 11:29:43 [core.py:377]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
ERROR 04-01 11:29:43 [core.py:377]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
ERROR 04-01 11:29:43 [core.py:377]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
ERROR 04-01 11:29:43 [core.py:377]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
ERROR 04-01 11:29:43 [core.py:377]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
ERROR 04-01 11:29:43 [core.py:377]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:29:43 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 191, in get_and_validate_mm_inputs
ERROR 04-01 11:29:43 [core.py:377]     raise AssertionError(
ERROR 04-01 11:29:43 [core.py:377] AssertionError: The processed dummy data has a total of {'image': 3025} placeholder tokens, which is not the expected {'image': 3080} tokens.
ERROR 04-01 11:29:43 [core.py:377] 
CRITICAL 04-01 11:29:43 [core_client.py:343] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

Pixtral HF runs into a similar error:

python examples/offline_inference/vision_language.py -m pixtral_hf   

INFO 04-01 11:50:46 [gpu_model_runner.py:1498] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 2 image items of the maximum feature size.
ERROR 04-01 11:50:48 [core.py:377] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 365, in run_engine_core
ERROR 04-01 11:50:48 [core.py:377]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-01 11:50:48 [core.py:377]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 306, in __init__
ERROR 04-01 11:50:48 [core.py:377]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 69, in __init__
ERROR 04-01 11:50:48 [core.py:377]     num_gpu_blocks, num_cpu_blocks = self._initialize_kv_caches(
ERROR 04-01 11:50:48 [core.py:377]                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/engine/core.py", line 130, in _initialize_kv_caches
ERROR 04-01 11:50:48 [core.py:377]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 04-01 11:50:48 [core.py:377]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 04-01 11:50:48 [core.py:377]     output = self.collective_rpc("determine_available_memory")
ERROR 04-01 11:50:48 [core.py:377]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 04-01 11:50:48 [core.py:377]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 04-01 11:50:48 [core.py:377]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/utils.py", line 2329, in run_method
ERROR 04-01 11:50:48 [core.py:377]     return func(*args, **kwargs)
ERROR 04-01 11:50:48 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/venvs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-01 11:50:48 [core.py:377]     return func(*args, **kwargs)
ERROR 04-01 11:50:48 [core.py:377]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
ERROR 04-01 11:50:48 [core.py:377]     self.model_runner.profile_run()
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/v1/worker/gpu_model_runner.py", line 1504, in profile_run
ERROR 04-01 11:50:48 [core.py:377]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
ERROR 04-01 11:50:48 [core.py:377]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
ERROR 04-01 11:50:48 [core.py:377]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
ERROR 04-01 11:50:48 [core.py:377]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
ERROR 04-01 11:50:48 [core.py:377]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
ERROR 04-01 11:50:48 [core.py:377]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-01 11:50:48 [core.py:377]   File "/home/mgoin/code/vllm/vllm/multimodal/profiling.py", line 191, in get_and_validate_mm_inputs
ERROR 04-01 11:50:48 [core.py:377]     raise AssertionError(
ERROR 04-01 11:50:48 [core.py:377] AssertionError: The processed dummy data has a total of {'image': 4096} placeholder tokens, which is not the expected {'image': 4160} tokens.
ERROR 04-01 11:50:48 [core.py:377] 
CRITICAL 04-01 11:50:48 [core_client.py:343] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.

Signed-off-by: DarkLight1337 <[email protected]>

Signed-off-by: Roger Wang <[email protected]>

ywang96

Looks like increasing the audio fetch timeout indeed fixes the test, so I assume it's probably just a cold start issue?

Anyways LGTM

Signed-off-by: Roger Wang <[email protected]>

ywang96 · 2025-04-04T17:58:51Z

pytest -v -s -x models/decoder_only/vision_language/test_pixtral.py::test_chat is failing on CI in the extended multimodal test so I'm currently looking into fixing it.

Signed-off-by: Roger Wang <[email protected]>

)" This reverts commit f5722a5.

mergify · 2025-04-04T22:00:09Z

⚠️ The sha of the head commit of this PR conflicts with #16076. Mergify cannot evaluate rules on this PR. ⚠️

…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…t#15712) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: Roger Wang <[email protected]>

DarkLight1337 requested a review from mgoin March 28, 2025 15:55

DarkLight1337 mentioned this pull request Mar 28, 2025

[Bug] Mismatch between get_multimodal_embedding output and PlaceholderRange #15144

Closed

5 tasks

mergify bot added documentation Improvements or additions to documentation multi-modality Related to multi-modality (#4194) v1 tpu Related to Google TPUs labels Mar 28, 2025

njhill reviewed Mar 28, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

mgoin reviewed Mar 31, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 31, 2025

DarkLight1337 mentioned this pull request Mar 31, 2025

[Bugfix] Check dimensions of multimodal embeddings in V1 #15816

Merged

DarkLight1337 added this to Multi-modality Core Mar 31, 2025

DarkLight1337 moved this to Todo in Multi-modality Core Mar 31, 2025

DarkLight1337 moved this from Todo to Blocked in Multi-modality Core Mar 31, 2025

DarkLight1337 self-assigned this Mar 31, 2025

DarkLight1337 moved this from Blocked to In Progress in Multi-modality Core Mar 31, 2025

[V1] Scatter and gather placeholders in the model runner

9e4e312

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 force-pushed the v1-is-embed branch from 9fbd1c2 to 9e4e312 Compare March 31, 2025 16:42

mergify bot removed the needs-rebase label Mar 31, 2025

DarkLight1337 added 3 commits March 31, 2025 16:57

Fix placeholder token calculation

d2ad7e6

Signed-off-by: DarkLight1337 <[email protected]>

Improve error message

776074b

Signed-off-by: DarkLight1337 <[email protected]>

Loosen check

d8c10c8

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 4 commits March 31, 2025 17:17

Fix Molmo

6257621

Signed-off-by: DarkLight1337 <[email protected]>

Rename

6096406

Signed-off-by: DarkLight1337 <[email protected]>

Fix gemma3

d1568ad

Signed-off-by: DarkLight1337 <[email protected]>

Skip h2ovl for current transformers

a68228e

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from robertgshaw2-redhat, comaniac and alexm-redhat as code owners April 3, 2025 09:31

Fix CI

e3ec92d

Signed-off-by: DarkLight1337 <[email protected]>

This was referenced Apr 3, 2025

[Model] Add SupportsMultiModal.get_language_model interface #16007

Merged

[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill #15837

Merged

Roger Wang added 2 commits April 3, 2025 21:11

increase default timeout for testing

6d51118

Signed-off-by: Roger Wang <[email protected]>

typo

4940d16

Signed-off-by: Roger Wang <[email protected]>

ywang96 approved these changes Apr 4, 2025

View reviewed changes

ywang96 enabled auto-merge (squash) April 4, 2025 06:21

lower max model len

522833b

Signed-off-by: Roger Wang <[email protected]>

update pixtral test

b64983c

Signed-off-by: Roger Wang <[email protected]>

ywang96 removed the ready ONLY add when PR is ready to merge/full CI is needed label Apr 4, 2025

ywang96 merged commit f5722a5 into vllm-project:main Apr 4, 2025
47 checks passed

github-project-automation bot moved this from In Progress to Done in Multi-modality Core Apr 4, 2025

mgoin mentioned this pull request Apr 4, 2025

[TPU][V1] Remove unused scatter/gather_placeholders function #16073

Closed

ywang96 added a commit that referenced this pull request Apr 4, 2025

Revert "[V1] Scatter and gather placeholders in the model runner (#15712

677eac4

)" This reverts commit f5722a5.

This was referenced Apr 4, 2025

Revert "[V1] Scatter and gather placeholders in the model runner" #16075

Merged

[V1] Scatter and gather placeholders in the model runner #16076

Merged

DarkLight1337 removed this from Multi-modality Core Apr 5, 2025

DarkLight1337 deleted the v1-is-embed branch April 5, 2025 10:37

This was referenced Mar 28, 2025

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item #16408

Merged

DarkLight1337 added this to Multi-modality Core Apr 10, 2025

DarkLight1337 moved this to Done in Multi-modality Core Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Scatter and gather placeholders in the model runner #15712

[V1] Scatter and gather placeholders in the model runner #15712

DarkLight1337 commented Mar 28, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 28, 2025

njhill left a comment

mgoin left a comment •

edited

Loading

mergify bot commented Mar 31, 2025

DarkLight1337 commented Mar 31, 2025 •

edited

Loading

DarkLight1337 commented Mar 31, 2025 •

edited

Loading

mgoin commented Apr 1, 2025 •

edited

Loading

ywang96 left a comment

ywang96 commented Apr 4, 2025

mergify bot commented Apr 4, 2025

[V1] Scatter and gather placeholders in the model runner #15712

[V1] Scatter and gather placeholders in the model runner #15712

Conversation

DarkLight1337 commented Mar 28, 2025 • edited by github-actions bot Loading

github-actions bot commented Mar 28, 2025

njhill left a comment

Choose a reason for hiding this comment

mgoin left a comment • edited Loading

Choose a reason for hiding this comment

mergify bot commented Mar 31, 2025

DarkLight1337 commented Mar 31, 2025 • edited Loading

DarkLight1337 commented Mar 31, 2025 • edited Loading

mgoin commented Apr 1, 2025 • edited Loading

ywang96 left a comment

Choose a reason for hiding this comment

ywang96 commented Apr 4, 2025

mergify bot commented Apr 4, 2025

DarkLight1337 commented Mar 28, 2025 •

edited by github-actions bot

Loading

mgoin left a comment •

edited

Loading

DarkLight1337 commented Mar 31, 2025 •

edited

Loading

DarkLight1337 commented Mar 31, 2025 •

edited

Loading

mgoin commented Apr 1, 2025 •

edited

Loading