You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/contributing/model/multimodal.md
+34-28Lines changed: 34 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -128,11 +128,9 @@ HF processing as well as memory profiling.
128
128
129
129
### For memory profiling
130
130
131
-
Override the abstract method {meth}`~vllm.multimodal.profiling.BaseDummyInputsBuilder.get_dummy_processor_inputs`
132
-
to construct dummy inputs for memory profiling. This dummy input should result in the worst-case memory usage of
133
-
the model so that vLLM can reserve the correct amount of memory for it.
131
+
Override the abstract methods {meth}`~vllm.multimodal.profiling.BaseDummyInputsBuilder.get_dummy_text`and {meth}`~vllm.multimodal.profiling.BaseDummyInputsBuilder.get_dummy_mm_data` to construct dummy inputs for memory profiling. These dummy inputs should result in the worst-case memory usage of the model so that vLLM can reserve the correct amount of memory for it.
134
132
135
-
Assuming that the memory usage increases with the number of tokens, the dummy input can be constructed to maximize the number of output embeddings, which is the same number as placeholder feature tokens.
133
+
Assuming that the memory usage increases with the number of tokens, the dummy inputs can be constructed to maximize the number of output embeddings, which is the same number as placeholder feature tokens.
136
134
137
135
::::{tab-set}
138
136
:::{tab-item} Basic example: LLaVA
@@ -244,38 +242,45 @@ def get_num_image_tokens(
244
242
```
245
243
246
244
Notice that the number of image tokens doesn't depend on the image width and height.
247
-
We can simply use a dummy `image_size`:
245
+
We can simply use a dummy `image_size` to calculate the multimodal profiling data:
248
246
249
247
```python
248
+
#NOTE: In actuality, this is usually implemented as part of the
249
+
# model's subclass of `BaseProcessingInfo`, but we show it as is
Copy file name to clipboardExpand all lines: docs/source/design/mm_processing.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ Moreover, since the tokenized text has not passed through the HF processor, we h
47
47
48
48
### Dummy text
49
49
50
-
We work around the first issue by requiring each model to define how to generate dummy text based on the number of multi-modal inputs, via {meth}`~vllm.multimodal.profiling.BaseDummyInputsBuilder.get_dummy_processor_inputs`. This lets us generate dummy text corresponding to the multi-modal inputs and input them together to obtain the processed multi-modal data.
50
+
We work around the first issue by requiring each model to define how to generate dummy text based on the number of multi-modal inputs, via {meth}`~vllm.multimodal.profiling.BaseDummyInputsBuilder.get_dummy_text`. This lets us generate dummy text corresponding to the multi-modal inputs and input them together to obtain the processed multi-modal data.
0 commit comments