@@ -31,6 +31,59 @@ Please refer to the above pages for more details about each API.
31
31
This section lists the most common options for running the vLLM engine.
32
32
For a full list, refer to the [ Engine Arguments] ( #engine-args ) page.
33
33
34
+ ### Model resolution
35
+
36
+ vLLM loads HuggingFace-compatible models by inspecting the ` architectures ` field in ` config.json ` of the model repository
37
+ and finding the corresponding implementation that is registered to vLLM.
38
+ Nevertheless, our model resolution may fail for the following reasons:
39
+
40
+ - The ` config.json ` of the model repository lacks the ` architectures ` field.
41
+ - Unofficial repositories refer to a model using alternative names which are not recorded in vLLM.
42
+ - The same architecture name is used for multiple models, creating ambiguity as to which model should be loaded.
43
+
44
+ In those cases, vLLM may throw an error like:
45
+
46
+ ``` text
47
+ Traceback (most recent call last):
48
+ ...
49
+ File "vllm/model_executor/models/registry.py", line xxx, in inspect_model_cls
50
+ for arch in architectures:
51
+ TypeError: 'NoneType' object is not iterable
52
+ ```
53
+
54
+ or:
55
+
56
+ ``` text
57
+ File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
58
+ raise ValueError(
59
+ ValueError: Model architectures ['<arch>'] are not supported for now. Supported architectures: [...]
60
+ ```
61
+
62
+ :::{note}
63
+ The above error is distinct from the following similar but different error:
64
+
65
+ ``` text
66
+ File "vllm/model_executor/models/registry.py", line xxx, in _raise_for_unsupported
67
+ raise ValueError(
68
+ ValueError: Model architectures ['<arch>'] failed to be inspected. Please check the logs for more details.
69
+ ```
70
+
71
+ This error means that vLLM failed to import the model file. Usually, it is related to missing dependencies or outdated
72
+ binaries in the vLLM build. Please read the logs carefully to determine the real cause of the error.
73
+ :::
74
+
75
+ To fix this, explicitly specify the model architecture by passing ` config.json ` overrides to the ` hf_overrides ` option.
76
+ For example:
77
+
78
+ ``` python
79
+ model = LLM(
80
+ model = " cerebras/Cerebras-GPT-1.3B" ,
81
+ hf_overrides = {" architectures" : [" GPT2LMHeadModel" ]}, # GPT-2
82
+ )
83
+ ```
84
+
85
+ Our [ list of supported models] ( #supported-models ) shows the model architectures that are recognized by vLLM.
86
+
34
87
### Reducing memory usage
35
88
36
89
Large models might cause your machine to run out of memory (OOM). Here are some options that help alleviate this problem.
0 commit comments