Skip to content

Commit 86ba159

Browse files
DarkLight1337shreyankg
authored andcommitted
[Doc] [3/N] Refer code examples for common cases in dev multimodal processor (vllm-project#14278)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent 730fd03 commit 86ba159

File tree

1 file changed

+33
-1
lines changed

1 file changed

+33
-1
lines changed

docs/source/contributing/model/multimodal.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -859,7 +859,7 @@ prompt_tokens, prompts_length = _tokenize_prompts_with_image_and_batch(
859859
)
860860
```
861861

862-
To accommodate this, instead of a string you can return an instance of `PromptUpdateDetails`
862+
To accommodate this, instead of a string you can return an instance of {class}`~vllm.multimodal.processing.PromptUpdateDetails`
863863
with different `full` and `feature` attributes:
864864

865865
```python
@@ -948,3 +948,35 @@ to register them to the multi-modal registry:
948948
+ dummy_inputs=YourDummyInputsBuilder)
949949
class YourModelForImage2Seq(nn.Module, SupportsMultiModal):
950950
```
951+
952+
## Notes
953+
954+
### Inserting feature tokens without replacement
955+
956+
Some HF processors directly insert feature tokens without replacing anything in the original prompt. In that case, you can use {class}`~vllm.multimodal.processing.PromptInsertion` instead of {class}`~vllm.multimodal.processing.PromptReplacement` inside {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_prompt_updates`.
957+
958+
Examples:
959+
960+
- BLIP-2 (insert at start of prompt): <gh-file:vllm/model_executor/models/blip2.py>
961+
- Florence2 (insert at start of prompt): <gh-file:vllm/model_executor/models/florence2.py>
962+
- Molmo (insert after `<|endoftext|>` token): <gh-file:vllm/model_executor/models/molmo.py>
963+
964+
### Handling prompt updates unrelated to multi-modal data
965+
966+
{meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_prompt_updates` assumes that each application of prompt update corresponds to one multi-modal item. If the HF processor performs additional processing regardless of how many multi-modal items there are, you should override {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._apply_hf_processor_tokens_only` so that the processed token inputs are consistent with the result of applying the HF processor on text inputs. This is because token inputs bypass the HF processor according to [our design](#mm-processing).
967+
968+
Examples:
969+
970+
- Chameleon (appends `sep_token`): <gh-file:vllm/model_executor/models/chameleon.py>
971+
- Fuyu (appends `boa_token`): <gh-file:vllm/model_executor/models/fuyu.py>
972+
- Molmo (applies chat template which is not defined elsewhere): <gh-file:vllm/model_executor/models/molmo.py>
973+
974+
### Custom HF processor
975+
976+
Some models don't define a HF processor class on HF Hub. In that case, you can define a custom HF processor that has the same call signature as HF processors and pass it to {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._call_hf_processor`.
977+
978+
Examples:
979+
980+
- DeepSeek-VL2: <gh-file:vllm/model_executor/models/deepseek_vl2.py>
981+
- InternVL: <gh-file:vllm/model_executor/models/internvl.py>
982+
- Qwen-VL: <gh-file:vllm/model_executor/models/qwen_vl.py>

0 commit comments

Comments
 (0)