Skip to content

Commit 9a22834

Browse files
[Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent bd82872 commit 9a22834

File tree

3 files changed

+73
-27
lines changed

3 files changed

+73
-27
lines changed

docs/source/models/supported_models.md

Lines changed: 34 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ
322322
- ✅︎
323323
- ✅︎
324324
* - `Qwen2ForCausalLM`
325-
- Qwen2
325+
- QwQ, Qwen2
326326
- `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc.
327327
- ✅︎
328328
- ✅︎
@@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t
436436
```
437437

438438
If your model is not in the above list, we will try to automatically convert the model using
439-
{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
439+
{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings
440440
of the whole prompt are extracted from the normalized hidden state corresponding to the last token.
441441

442442
#### Reward Modeling (`--task reward`)
@@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding
468468
```
469469

470470
If your model is not in the above list, we will try to automatically convert the model using
471-
{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
471+
{func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly.
472472

473473
```{important}
474474
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
@@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r
499499
```
500500

501501
If your model is not in the above list, we will try to automatically convert the model using
502-
{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
502+
{func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token.
503503

504504
#### Sentence Pair Scoring (`--task score`)
505505

@@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive.
550550

551551
See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model.
552552

553+
````{important}
554+
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
555+
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
556+
557+
Offline inference:
558+
```python
559+
llm = LLM(
560+
model="Qwen/Qwen2-VL-7B-Instruct",
561+
limit_mm_per_prompt={"image": 4},
562+
)
563+
```
564+
565+
Online inference:
566+
```bash
567+
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
568+
```
569+
````
570+
571+
```{note}
572+
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
573+
```
574+
553575
### Generative Models
554576

555577
See [this page](#generative-models) for more information on how to use generative models.
@@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ
689711
* - `Phi3VForCausalLM`
690712
- Phi-3-Vision, Phi-3.5-Vision
691713
- T + I<sup>E+</sup>
692-
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc.
714+
- `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc.
693715
-
694716
- ✅︎
695717
- ✅︎
696718
* - `PixtralForConditionalGeneration`
697719
- Pixtral
698720
- T + I<sup>+</sup>
699-
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc.
721+
- `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc.
700722
-
701723
- ✅︎
702724
- ✅︎
@@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ
715737
- ✅︎
716738
- ✅︎
717739
* - `Qwen2VLForConditionalGeneration`
718-
- Qwen2-VL
740+
- QVQ, Qwen2-VL
719741
- T + I<sup>E+</sup> + V<sup>E+</sup>
720742
- `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc.
721743
- ✅︎
@@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ
733755
<sup>E</sup> Pre-computed embeddings can be inputted for this modality.
734756
<sup>+</sup> Multiple items can be inputted per text prompt for this modality.
735757

736-
````{important}
737-
To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference)
738-
or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt:
739-
740-
```python
741-
llm = LLM(
742-
model="Qwen/Qwen2-VL-7B-Instruct",
743-
limit_mm_per_prompt={"image": 4},
744-
)
745-
```
746-
747-
```bash
748-
vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4
749-
```
750-
````
751-
752-
```{note}
753-
vLLM currently only supports adding LoRA to the language backbone of multimodal models.
754-
```
755-
756758
```{note}
757759
To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM.
758760
```
@@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (`
762764
For more details, please see: <gh-pr:4087#issuecomment-2250397630>
763765
```
764766

767+
```{note}
768+
The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)).
769+
A corrected version is available at <gh-file:examples/template_pixtral_hf.jinja>.
770+
```
771+
765772
### Pooling Models
766773

767774
See [this page](pooling-models) for more information on how to use pooling models.

examples/template_pixtral_hf.jinja

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
{%- if messages[0]["role"] == "system" %}
2+
{%- set system_message = messages[0]["content"] %}
3+
{%- set loop_messages = messages[1:] %}
4+
{%- else %}
5+
{%- set loop_messages = messages %}
6+
{%- endif %}
7+
8+
{{- bos_token }}
9+
{%- for message in loop_messages %}
10+
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
11+
{{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}
12+
{%- endif %}
13+
{%- if message["role"] == "user" %}
14+
{%- if loop.last and system_message is defined %}
15+
{{- "[INST]" + system_message + "\n" }}
16+
{%- else %}
17+
{{- "[INST]" }}
18+
{%- endif %}
19+
{%- if message["content"] is not string %}
20+
{%- for chunk in message["content"] %}
21+
{%- if chunk["type"] == "text" %}
22+
{{- chunk["text"] }}
23+
{%- elif chunk["type"] == "image" %}
24+
{{- "[IMG]" }}
25+
{%- else %}
26+
{{- raise_exception("Unrecognized content type!") }}
27+
{%- endif %}
28+
{%- endfor %}
29+
{%- else %}
30+
{{- message["content"] }}
31+
{%- endif %}
32+
{{- "[/INST]" }}
33+
{%- elif message["role"] == "assistant" %}
34+
{{- message["content"] + eos_token}}
35+
{%- else %}
36+
{{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
37+
{%- endif %}
38+
{%- endfor %}

tests/entrypoints/test_chat_utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format):
758758
("template_falcon.jinja", "string"),
759759
("template_inkbot.jinja", "string"),
760760
("template_llava.jinja", "string"),
761+
("template_pixtral_hf.jinja", "openai"),
761762
("template_vlm2vec.jinja", "openai"),
762763
("tool_chat_template_granite_20b_fc.jinja", "string"),
763764
("tool_chat_template_hermes.jinja", "string"),

0 commit comments

Comments
 (0)