You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: prototype_source/pt2e_quant_x86_inductor.rst
+14-2Lines changed: 14 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,10 @@ The pytorch 2 export quantization flow uses the torch.export to capture the mode
21
21
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
22
22
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels.
23
23
24
-
This flow of quantization 2 with Inductor mainly includes three steps:
24
+
This flow of quantization 2 with Inductor supports both static and dynamic quantization. Static quantization works best for CNN models, like ResNet-50. And dynamic quantization is more suitable for NLP models, like RNN and BERT.
25
+
For the difference between the two quantization types, please refer to the `following page <https://pytorch.org/docs/stable/quantization.html#quantization-mode-support>`__.
26
+
27
+
The quantization flow mainly includes three steps:
25
28
26
29
- Step 1: Capture the FX Graph from the eager Model based on the `torch export mechanism <https://pytorch.org/docs/main/export.html>`_.
27
30
- Step 2: Apply the Quantization flow based on the captured FX Graph, including defining the backend-specific quantizer, generating the prepared model with observers,
@@ -134,14 +137,22 @@ quantize the model.
134
137
`multiplications are 7-bit x 8-bit <https://oneapi-src.github.io/oneDNN/dev_guide_int8_computations.html#inputs-of-mixed-type-u8-and-s8>`_. In other words, potential
135
138
numeric saturation and accuracy issue may happen when running on CPU without Vector Neural Network Instruction.
136
139
140
+
The quantization config is for static quantization by default. To apply dynamic quantization, add an argument ``is_dynamic=True`` when getting the config.
0 commit comments