Skip to content

Commit d07efb3

Browse files
[Doc] Troubleshooting errors during model inspection (#12351)
Signed-off-by: DarkLight1337 <[email protected]>
1 parent 978b45f commit d07efb3

File tree

2 files changed

+40
-33
lines changed

2 files changed

+40
-33
lines changed

docs/source/getting_started/troubleshooting.md

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ It'd be better to store the model in a local disk. Additionally, have a look at
2222
To isolate the model downloading and loading issue, you can use the `--load-format dummy` argument to skip loading the model weights. This way, you can check if the model downloading and loading is the bottleneck.
2323
```
2424

25-
## Model is too large
25+
## Out of memory
2626

27-
If the model is too large to fit in a single GPU, you might want to [consider tensor parallelism](#distributed-serving) to split the model across multiple GPUs. In that case, every process will read the whole model and split it into chunks, which makes the disk reading time even longer (proportional to the size of tensor parallelism). You can convert the model checkpoint to a sharded checkpoint using <gh-file:examples/offline_inference/save_sharded_state.py>. The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
27+
If the model is too large to fit in a single GPU, you will get an out-of-memory (OOM) error. Consider [using tensor parallelism](#distributed-serving) to split the model across multiple GPUs. In that case, every process will read the whole model and split it into chunks, which makes the disk reading time even longer (proportional to the size of tensor parallelism). You can convert the model checkpoint to a sharded checkpoint using <gh-file:examples/offline_inference/save_sharded_state.py>. The conversion process might take some time, but later you can load the sharded checkpoint much faster. The model loading time should remain constant regardless of the size of tensor parallelism.
2828

2929
## Enable more logging
3030

@@ -218,6 +218,42 @@ print(f(x))
218218

219219
If it raises errors from `torch/_inductor` directory, usually it means you have a custom `triton` library that is not compatible with the version of PyTorch you are using. See [this issue](https://github.com/vllm-project/vllm/issues/12219) for example.
220220

221+
## Model failed to be inspected
222+
223+
If you see an error like:
224+
225+
```text
226+
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
227+
raise ValueError(
228+
ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
229+
```
230+
231+
It means that vLLM failed to import the model file.
232+
Usually, it is related to missing dependencies or outdated binaries in the vLLM build.
233+
Please read the logs carefully to determine the root cause of the error.
234+
235+
## Model not supported
236+
237+
If you see an error like:
238+
239+
```text
240+
Traceback (most recent call last):
241+
...
242+
File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
243+
for arch in architectures:
244+
TypeError: 'NoneType' object is not iterable
245+
```
246+
247+
or:
248+
249+
```text
250+
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
251+
raise ValueError(
252+
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
253+
```
254+
255+
But you are sure that the model is in the [list of supported models](#supported-models), there may be some issue with vLLM's model resolution. In that case, please follow [these steps](#model-resolution) to explicitly specify the vLLM implementation for the model.
256+
221257
## Known Issues
222258

223259
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).

docs/source/serving/offline_inference.md

Lines changed: 2 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ Please refer to the above pages for more details about each API.
3131
This section lists the most common options for running the vLLM engine.
3232
For a full list, refer to the [Engine Arguments](#engine-args) page.
3333

34+
(model-resolution)=
35+
3436
### Model resolution
3537

3638
vLLM loads HuggingFace-compatible models by inspecting the `architectures` field in `config.json` of the model repository
@@ -41,37 +43,6 @@ Nevertheless, our model resolution may fail for the following reasons:
4143
- Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
4244
- The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
4345

44-
In those cases, vLLM may throw an error like:
45-
46-
```text
47-
Traceback (most recent call last):
48-
...
49-
File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
50-
for arch in architectures:
51-
TypeError: 'NoneType' object is not iterable
52-
```
53-
54-
or:
55-
56-
```text
57-
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
58-
raise ValueError(
59-
ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
60-
```
61-
62-
:::{note}
63-
The above error is distinct from the following similar but different error:
64-
65-
```text
66-
File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
67-
raise ValueError(
68-
ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
69-
```
70-
71-
This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
72-
binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
73-
:::
74-
7546
To fix this, explicitly specify the model architecture by passing `config.json` overrides to the `hf_overrides` option.
7647
For example:
7748

0 commit comments

Comments
 (0)