Skip to content

Commit 562f4dd

Browse files
DarkLight1337rasmith
authored andcommitted
[Doc] Add documentation for specifying model architecture (vllm-project#12105)
1 parent 6a42e09 commit 562f4dd

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed

docs/source/serving/offline_inference.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,59 @@ Please refer to the above pages for more details about each API.
3131
This section lists the most common options for running the vLLM engine.
3232
For a full list, refer to the [Engine Arguments](#engine-args) page.
3333

34+
### Model resolution
35+
36+
vLLM loads HuggingFace-compatible models by inspecting the `architectures` field in `config.json` of the model repository
37+
and finding the corresponding implementation that is registered to vLLM.
38+
Nevertheless, our model resolution may fail for the following reasons:
39+
40+
- The `config.json` of the model repository lacks the `architectures` field.
41+
- Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
42+
- The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
43+
44+
In those cases, vLLM may throw an error like:
45+
46+
```text
47+
Traceback (most recent call last):
48+
...
49+
File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
50+
for arch in architectures:
51+
TypeError: 'NoneType' object is not iterable
52+
```
53+
54+
or:
55+
56+
```text
57+
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
58+
raise ValueError(
59+
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
60+
```
61+
62+
:::{note}
63+
The above error is distinct from the following similar but different error:
64+
65+
```text
66+
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
67+
raise ValueError(
68+
ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
69+
```
70+
71+
This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
72+
binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
73+
:::
74+
75+
To fix this, explicitly specify the model architecture by passing `config.json` overrides to the `hf_overrides` option.
76+
For example:
77+
78+
```python
79+
model = LLM(
80+
model="cerebras/Cerebras-GPT-1.3B",
81+
hf_overrides={"architectures": ["GPT2LMHeadModel"]}, # GPT-2
82+
)
83+
```
84+
85+
Our [list of supported models](#supported-models) shows the model architectures that are recognized by vLLM.
86+
3487
### Reducing memory usage
3588

3689
Large models might cause your machine to run out of memory (OOM). Here are some options that help alleviate this problem.

0 commit comments

Comments
 (0)