You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To accommodate this, instead of a string you can return an instance of `PromptUpdateDetails`
862
+
To accommodate this, instead of a string you can return an instance of {class}`~vllm.multimodal.processing.PromptUpdateDetails`
863
863
with different `full`and`feature` attributes:
864
864
865
865
```python
@@ -948,3 +948,35 @@ to register them to the multi-modal registry:
948
948
+dummy_inputs=YourDummyInputsBuilder)
949
949
class YourModelForImage2Seq(nn.Module, SupportsMultiModal):
950
950
```
951
+
952
+
## Notes
953
+
954
+
### Inserting feature tokens without replacement
955
+
956
+
Some HF processors directly insert feature tokens without replacing anything in the original prompt. In that case, you can use {class}`~vllm.multimodal.processing.PromptInsertion` instead of {class}`~vllm.multimodal.processing.PromptReplacement` inside {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_prompt_updates`.
957
+
958
+
Examples:
959
+
960
+
-BLIP-2 (insert at start of prompt): <gh-file:vllm/model_executor/models/blip2.py>
961
+
- Florence2 (insert at start of prompt): <gh-file:vllm/model_executor/models/florence2.py>
962
+
- Molmo (insert after `<|endoftext|>` token): <gh-file:vllm/model_executor/models/molmo.py>
963
+
964
+
### Handling prompt updates unrelated to multi-modal data
965
+
966
+
{meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._get_prompt_updates` assumes that each application of prompt update corresponds to one multi-modal item. If the HF processor performs additional processing regardless of how many multi-modal items there are, you should override {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._apply_hf_processor_tokens_only` so that the processed token inputs are consistent with the result of applying the HF processor on text inputs. This is because token inputs bypass the HF processor according to [our design](#mm-processing).
- Molmo (applies chat template which isnot defined elsewhere): <gh-file:vllm/model_executor/models/molmo.py>
973
+
974
+
### Custom HF processor
975
+
976
+
Some models don't define a HF processor class on HF Hub. In that case, you can define a custom HF processor that has the same call signature as HF processors and pass it to {meth}`~vllm.multimodal.processing.BaseMultiModalProcessor._call_hf_processor`.
0 commit comments