@@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
322
322
- ✅︎
323
323
- ✅︎
324
324
* - `Qwen2ForCausalLM`
325
- - Qwen2
325
+ - QwQ, Qwen2
326
326
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
327
327
- ✅︎
328
328
- ✅︎
@@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
436
436
```
437
437
438
438
If your model is not in the above list, we will try to automatically convert the model using
439
- {func}` vllm.model_executor.models.adapters.as_embedding_model ` . By default, the embeddings
439
+ {func}` ~ vllm.model_executor.models.adapters.as_embedding_model` . By default, the embeddings
440
440
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
441
441
442
442
#### Reward Modeling (` --task reward ` )
@@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
468
468
```
469
469
470
470
If your model is not in the above list, we will try to automatically convert the model using
471
- {func}` vllm.model_executor.models.adapters.as_reward_model ` . By default, we return the hidden states of each token directly.
471
+ {func}` ~ vllm.model_executor.models.adapters.as_reward_model` . By default, we return the hidden states of each token directly.
472
472
473
473
``` {important}
474
474
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
@@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
499
499
```
500
500
501
501
If your model is not in the above list, we will try to automatically convert the model using
502
- {func}` vllm.model_executor.models.adapters.as_classification_model ` . By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
502
+ {func}` ~ vllm.model_executor.models.adapters.as_classification_model` . By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
503
503
504
504
#### Sentence Pair Scoring (` --task score ` )
505
505
@@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.
550
550
551
551
See [ this page] ( #multimodal-inputs ) on how to pass multi-modal inputs to the model.
552
552
553
+ ```` {important}
554
+ To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
555
+ or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
556
+
557
+ Offline inference:
558
+ ```python
559
+ llm = LLM(
560
+ model="Qwen/Qwen2-VL-7B-Instruct",
561
+ limit_mm_per_prompt={"image": 4},
562
+ )
563
+ ```
564
+
565
+ Online inference:
566
+ ```bash
567
+ vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
568
+ ```
569
+ ````
570
+
571
+ ``` {note}
572
+ vLLM currently only supports adding LoRA to the language backbone of multimodal models.
573
+ ```
574
+
553
575
### Generative Models
554
576
555
577
See [ this page] ( #generative-models ) for more information on how to use generative models.
@@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
689
711
* - `Phi3VForCausalLM`
690
712
- Phi-3-Vision, Phi-3.5-Vision
691
713
- T + I<sup>E+</sup>
692
- - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
714
+ - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
693
715
-
694
716
- ✅︎
695
717
- ✅︎
696
718
* - `PixtralForConditionalGeneration`
697
719
- Pixtral
698
720
- T + I<sup>+</sup>
699
- - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
721
+ - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
700
722
-
701
723
- ✅︎
702
724
- ✅︎
@@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
715
737
- ✅︎
716
738
- ✅︎
717
739
* - `Qwen2VLForConditionalGeneration`
718
- - Qwen2-VL
740
+ - QVQ, Qwen2-VL
719
741
- T + I<sup>E+</sup> + V<sup>E+</sup>
720
742
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
721
743
- ✅︎
@@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
733
755
<sup >E</sup > Pre-computed embeddings can be inputted for this modality.
734
756
<sup >+</sup > Multiple items can be inputted per text prompt for this modality.
735
757
736
- ```` {important}
737
- To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
738
- or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
739
-
740
- ```python
741
- llm = LLM(
742
- model="Qwen/Qwen2-VL-7B-Instruct",
743
- limit_mm_per_prompt={"image": 4},
744
- )
745
- ```
746
-
747
- ```bash
748
- vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
749
- ```
750
- ````
751
-
752
- ``` {note}
753
- vLLM currently only supports adding LoRA to the language backbone of multimodal models.
754
- ```
755
-
756
758
``` {note}
757
759
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
758
760
```
@@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
762
764
For more details, please see: <gh-pr:4087#issuecomment-2250397630>
763
765
```
764
766
767
+ ``` {note}
768
+ The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
769
+ A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
770
+ ```
771
+
765
772
### Pooling Models
766
773
767
774
See [ this page] ( pooling-models ) for more information on how to use pooling models.
0 commit comments