Skip to content

Commit 071e073

Browse files
Xia-Weiwensvekars
andauthored
Add dynamic quant to the tutorial of PT2E quantization with X86Inductor (#2819)
* Add dynamic quant to the tutorial of PT2E quantization with X86Inductor --------- Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent f2e2a6d commit 071e073

File tree

1 file changed

+14
-2
lines changed

1 file changed

+14
-2
lines changed

prototype_source/pt2e_quant_x86_inductor.rst

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,10 @@ The pytorch 2 export quantization flow uses the torch.export to capture the mode
2121
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
2222
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
2323

24-
This flow of quantization 2 with Inductor mainly includes three steps:
24+
This flow of quantization 2 with Inductor supports both static and dynamic quantization. Static quantization works best for CNN models, like ResNet-50. And dynamic quantization is more suitable for NLP models, like RNN and BERT.
25+
For the difference between the two quantization types, please refer to the `following page <https://pytorch.org/docs/stable/quantization.html#quantization-mode-support>`__.
26+
27+
The quantization flow mainly includes three steps:
2528

2629
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
2730
- Step 2: Apply the Quantization flow based on the captured FX Graph, including defining the backend-specific quantizer, generating the prepared model with observers,
@@ -134,14 +137,22 @@ quantize the model.
134137
`multiplications are 7-bit x 8-bit <https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#inputs-of-mixed-type-u8-and-s8>`_. In other words, potential
135138
numeric saturation and accuracy issue may happen when running on CPU without Vector Neural Network Instruction.
136139

140+
The quantization config is for static quantization by default. To apply dynamic quantization, add an argument ``is_dynamic=True`` when getting the config.
141+
142+
.. code-block:: python
143+
144+
quantizer = X86InductorQuantizer()
145+
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config(is_dynamic=True))
146+
147+
137148
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
138149
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
139150

140151
::
141152

142153
prepared_model = prepare_pt2e(exported_model, quantizer)
143154

144-
Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model.
155+
Now, we will calibrate the ``prepared_model`` after the observers are inserted in the model. This step is needed for static quantization only.
145156

146157
::
147158

@@ -268,6 +279,7 @@ The PyTorch 2 Export QAT flow is largely similar to the PTQ flow:
268279
269280
# Step 2. quantization-aware training
270281
# Use Backend Quantizer for X86 CPU
282+
# To apply dynamic quantization, add an argument ``is_dynamic=True`` when getting the config.
271283
quantizer = X86InductorQuantizer()
272284
quantizer.set_global(xiq.get_default_x86_inductor_quantization_config(is_qat=True))
273285
prepared_model = prepare_qat_pt2e(exported_model, quantizer)

0 commit comments

Comments
 (0)