squash

kylesayrs · kylesayrs · commit 94ab451f09d1 · 2025-01-25T04:51:14.000Z
Signed-off-by: Kyle Sayers &lt;kylesayrs@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -39,6 +39,7 @@ Applying quantization with `llmcompressor`:
 * [Activation quantization to `fp8`](examples/quantization_w8a8_fp8)
 * [Weight only quantization to `int4`](examples/quantization_w4a16)
 * [Quantizing MoE LLMs](examples/quantizing_moe)
+* [Quantizing Multimodal VLMs](examples/multimodal_vision)
 
 ### User Guides
 Deep dives into advanced usage of `llmcompressor`:
diff --git a/examples/multimodal_vision/README.md b/examples/multimodal_vision/README.md
@@ -0,0 +1,31 @@
+# Quantizing Multimodal Vision-Language Models #
+This directory contains example scripts for quantizing a variety of vision-language models using the GPTQ W4A16 quantization scheme.
+
+## Using your own models ##
+
+```python3
+recipe = [
+    GPTQModifier(
+        targets="Linear",
+        scheme="W4A16",
+        sequential_targets=["MistralDecoderLayer"],
+        ignore=["re:.*lm_head", "re:vision_tower.*", "re:multi_modal_projector.*"],
+    ),
+]
+```
+
+### Sequential Targets ###
+
+### Ignore ###
+
+### Tracing Errors ###
+Because the architectures of vision-language models is often times more complex than those of typical decoder-only text models, you may encounter `torch.fx.TraceError`s when attempting to quantize your model. For more information on `torch.fx.TraceError`s, why they occur, and how to resolve them, please see the [Model Tracing Guide](/src/llmcompressor/transformers/tracing/README.md).
+
+### Adding Smoothquant Mappings ###
+
+### Adding Data Collator ###
+* TODO: create a default "multimodal" collator
+
+## Customizing Dataset and Quantization Scheme ##
+. For a detailed walkthrough of customzing datasets and quantization for W4A16, see the
+[Quantization Guide](/examples/quantization_w4a16/README.md).