Skip to content

Commit 2c8ed8e

Browse files
authored
More informative error when using Transformers backend (#16988)
Signed-off-by: Harry Mellor <[email protected]>
1 parent ed50f46 commit 2c8ed8e

File tree

2 files changed

+38
-34
lines changed

2 files changed

+38
-34
lines changed

docs/source/models/supported_models.md

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -40,33 +40,37 @@ You can force the use of `TransformersForCausalLM` by setting `model_impl="trans
4040
vLLM may not fully optimise the Transformers implementation so you may see degraded performance if comparing a native model to a Transformers model in vLLM.
4141
:::
4242

43-
#### Supported features
43+
#### Custom models
4444

45-
The Transformers modeling backend explicitly supports the following features:
45+
If a model is neither supported natively by vLLM or Transformers, it can still be used in vLLM!
4646

47-
- <project:#quantization-index> (except GGUF)
48-
- <project:#lora-adapter>
49-
- <project:#distributed-serving>
47+
For a model to be compatible with the Transformers backend for vLLM it must:
5048

51-
#### Remote Code
49+
- be a Transformers compatible custom model (see [Transformers - Customizing models](https://huggingface.co/docs/transformers/en/custom_models)):
50+
* The model directory must have the correct structure (e.g. `config.json` is present).
51+
* `config.json` must contain `auto_map.AutoModel`.
52+
- be a Transformers backend for vLLM compatible model (see <project:#writing-custom-models>):
53+
* Customisation should be done in the base model (e.g. in `MyModel`, not `MyModelForCausalLM`).
5254

53-
If your model is neither supported natively by vLLM or Transformers, you can still run it in vLLM!
55+
If the compatible model is:
5456

55-
Simply set `trust_remote_code=True` and vLLM will run any model on the Model Hub that is compatible with Transformers.
56-
Provided that the model writer implements their model in a compatible way, this means that you can run new models before they are officially supported in Transformers or vLLM!
57+
- on the Hugging Face Model Hub, simply set `trust_remote_code=True` for <project:#offline-inference> or `--trust-remode-code` for the <project:#openai-compatible-server>.
58+
- in a local directory, simply pass directory path to `model=<MODEL_DIR>` for <project:#offline-inference> or `vllm serve <MODEL_DIR>` for the <project:#openai-compatible-server>.
5759

58-
:::{tip}
59-
If you have not yet created your custom model, you can follow this guide on [customising models in Transformers](https://huggingface.co/docs/transformers/en/custom_models).
60-
:::
60+
This means that, with the Transformers backend for vLLM, new models can be used before they are officially supported in Transformers or vLLM!
6161

62-
```python
63-
from vllm import LLM
64-
llm = LLM(model=..., task="generate", trust_remote_code=True) # Name or path of your model
65-
llm.apply_model(lambda model: print(model.__class__))
66-
```
62+
(writing-custom-models)=
63+
64+
#### Writing custom models
65+
66+
This section details the necessary modifications to make to a Transformers compatible custom model that make it compatible with the Transformers backend for vLLM. (We assume that a Transformers compatible custom model has already been created, see [Transformers - Customizing models](https://huggingface.co/docs/transformers/en/custom_models)).
6767

6868
To make your model compatible with the Transformers backend, it needs:
6969

70+
1. `kwargs` passed down through all modules from `MyModel` to `MyAttention`.
71+
2. `MyAttention` must use `ALL_ATTENTION_FUNCTIONS` to call attention.
72+
3. `MyModel` must contain `_supports_attention_backend = True`.
73+
7074
```{code-block} python
7175
:caption: modeling_my_model.py
7276
@@ -75,7 +79,7 @@ from torch import nn
7579
7680
class MyAttention(nn.Module):
7781
78-
def forward(self, hidden_states, **kwargs): # <- kwargs are required
82+
def forward(self, hidden_states, **kwargs):
7983
...
8084
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
8185
attn_output, attn_weights = attention_interface(
@@ -91,11 +95,11 @@ class MyModel(PreTrainedModel):
9195
_supports_attention_backend = True
9296
```
9397

94-
Here is what happens in the background:
98+
Here is what happens in the background when this model is loaded:
9599

96-
1. The config is loaded
97-
2. `MyModel` Python class is loaded from the `auto_map`, and we check that the model `_supports_attention_backend`.
98-
3. The `TransformersForCausalLM` backend is used. See <gh-file:vllm/model_executor/models/transformers.py>, which leverage `self.config._attn_implementation = "vllm"`, thus the need to use `ALL_ATTENTION_FUNCTION`.
100+
1. The config is loaded.
101+
2. `MyModel` Python class is loaded from the `auto_map` in config, and we check that the model `is_backend_compatible()`.
102+
3. `MyModel` is loaded into `TransformersForCausalLM` (see <gh-file:vllm/model_executor/models/transformers.py>) which sets `self.config._attn_implementation = "vllm"` so that vLLM's attention layer is used.
99103

100104
That's it!
101105

vllm/model_executor/model_loader/utils.py

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,6 @@ def set_default_torch_dtype(dtype: torch.dtype):
3030
torch.set_default_dtype(old_dtype)
3131

3232

33-
def is_transformers_impl_compatible(
34-
arch: str,
35-
module: Optional["transformers.PreTrainedModel"] = None) -> bool:
36-
mod = module or getattr(transformers, arch, None)
37-
if mod is None:
38-
return False
39-
return mod.is_backend_compatible()
40-
41-
4233
def resolve_transformers_arch(model_config: ModelConfig,
4334
architectures: list[str]):
4435
for i, arch in enumerate(architectures):
@@ -61,17 +52,26 @@ def resolve_transformers_arch(model_config: ModelConfig,
6152
revision=model_config.revision)
6253
for name, module in sorted(auto_map.items(), key=lambda x: x[0])
6354
}
64-
custom_model_module = auto_modules.get("AutoModel")
55+
model_module = getattr(transformers, arch, None)
56+
if model_module is None:
57+
if "AutoModel" not in auto_map:
58+
raise ValueError(
59+
f"Cannot find model module. '{arch}' is not a registered "
60+
"model in the Transformers library (only relevant if the "
61+
"model is meant to be in Transformers) and 'AutoModel' is "
62+
"not present in the model config's 'auto_map' (relevant "
63+
"if the model is custom).")
64+
model_module = auto_modules["AutoModel"]
6565
# TODO(Isotr0py): Further clean up these raises.
6666
# perhaps handled them in _ModelRegistry._raise_for_unsupported?
6767
if model_config.model_impl == ModelImpl.TRANSFORMERS:
68-
if not is_transformers_impl_compatible(arch, custom_model_module):
68+
if not model_module.is_backend_compatible():
6969
raise ValueError(
7070
f"The Transformers implementation of {arch} is not "
7171
"compatible with vLLM.")
7272
architectures[i] = "TransformersForCausalLM"
7373
if model_config.model_impl == ModelImpl.AUTO:
74-
if not is_transformers_impl_compatible(arch, custom_model_module):
74+
if not model_module.is_backend_compatible():
7575
raise ValueError(
7676
f"{arch} has no vLLM implementation and the Transformers "
7777
"implementation is not compatible with vLLM. Try setting "

0 commit comments

Comments
 (0)