Skip to content

Commit 01ecbb0

Browse files
adrianlizarragaankitm3k
authored andcommitted
[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations (microsoft#22677)
### Description Introduces the `get_qdq_config()` function to get a quantization configuration for a full integer QDQ model. This function provides an easier way of specifying commonly used options and sets convenient defaults. Specifically: - Instead of requiring the user to pass a dictionary of `extra_options`, the new interface adds function parameters for common settings: - All calibrator settings - Whether activations/weights are symmetric - Whether to keep or fuse relu/clip into Q - Minimum real range for quantization - Dictionary of tensor quantization overrides. - Automatically scans the input floating-point model and fills out the operator types to quantize. Otherwise, only a limited number of operator types would be quantized by default. - Detects if the input model uses external data. If so, ensures that the generated QDQ model also uses external data. - Detects if the model will use newly introduced quantization types (int4/int16) with an older opset. If so, forces the use of the `com.microsoft` domain for Q/DQ ops, which support all types. - Automatically enables the "extra option" called `ForceQuantizeNoInputCheck` to ensure data movement operators (e.g., Transpose) are always quantized. - User can pass a function to indicate which nodes to exclude from quantization. - The user can still pass their own `extra_options` to override any of the above if necessary. ```python from onnxruntime.quantization import get_int_qdq_config, quantize # , ... # Get QDQ configuration qdq_config = get_int_qdq_config( float_model, data_reader, calibrate_method=CalibrationMethod.Percentile, calibrate_args={"percentile": 99.98}, # Converted to extra_options activation_type=QuantType.QUInt8, weight_type=QuantType.QInt8, per_channel=True, nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"` # Other options converted to extra_options: min_real_range=0.0001, keep_removable_activations=True, activation_symmetric=True, weight_symmetric=True, ) # Quantize model quantize(float_model_path, qdq_model_path, qdq_config) ``` ### Motivation and Context Need a version of `get_qnn_qdq_config()` that is not EP-specific.
1 parent 6e5d9b8 commit 01ecbb0

File tree

2 files changed

+1
-7
lines changed

2 files changed

+1
-7
lines changed

onnxruntime/python/tools/quantization/quantize.py

+1-5
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,6 @@ def get_qdq_config(
231231
activation_symmetric: bool = False,
232232
weight_symmetric: bool | None = None,
233233
per_channel: bool = False,
234-
reduce_range: bool = False,
235234
keep_removable_activations: bool = False,
236235
min_real_range: float | None = None,
237236
tensor_quant_overrides: dict[str, list[dict[str, Any]]] | None = None,
@@ -246,7 +245,7 @@ def get_qdq_config(
246245
calibration_data_reader: Calibration data reader.
247246
calibrate_methode: The calibration method. Defaults to MinMax.
248247
activation_type: The default activation quantization type. Defaults to QUInt8.
249-
weight_type: The default weight quantization type. Defaults to QInt8.
248+
weight_type: The default weight quantization type. Defaults to QUInt8.
250249
activation_symmetric: True if activations should be quantized symmetrically (i.e, rmax == -rmin) by default.
251250
Defaults to false. For int8 and int16, this results in zero-point values of 0. For uint8 and uint16,
252251
the zero-point values are 127 and 32,767, respectively.
@@ -255,8 +254,6 @@ def get_qdq_config(
255254
per_channel: Global option that determines if a fixed set of operator types should be quantized per-channel.
256255
Defaults to false. Alternatively, use the tensor-level `tensor_quant_overrides` to select individual operators
257256
and their quantization axes.
258-
reduce_range: quantize weights with 1 less bit of precision (e.g., 7 bits for QInt8). Defaults to false.
259-
May improve the accuracy for some models running on non-VNNI machine, especially for per-channel mode.
260257
keep_removable_activations: Defaults to false. If true, "removable" activations (e.g., Clip or Relu) will not
261258
be removed, and will be explicitly represented in the QDQ model. If false, these activations
262259
are automatically removed if activations are asymmetrically quantized. Keeping these activations
@@ -376,7 +373,6 @@ def get_qdq_config(
376373
op_types_to_quantize=list(op_types.difference(op_types_to_exclude)),
377374
nodes_to_exclude=final_nodes_to_exclude,
378375
per_channel=per_channel,
379-
reduce_range=reduce_range,
380376
use_external_data_format=(model_has_external_data or model.ByteSize() >= MODEL_SIZE_THRESHOLD),
381377
extra_options=final_extra_options,
382378
)

onnxruntime/test/python/quantization/test_get_qdq_config.py

-2
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,6 @@ def test_basic_args(self):
9393
activation_type=QuantType.QUInt16,
9494
weight_type=QuantType.QInt16,
9595
per_channel=True,
96-
reduce_range=True,
9796
nodes_to_exclude=["Mul"],
9897
# Other options converted to extra_options:
9998
min_real_range=0.0001,
@@ -105,7 +104,6 @@ def test_basic_args(self):
105104
self.assertEqual(qdq_config.activation_type, QuantType.QUInt16)
106105
self.assertEqual(qdq_config.weight_type, QuantType.QInt16)
107106
self.assertTrue(qdq_config.per_channel)
108-
self.assertTrue(qdq_config.reduce_range)
109107
self.assertEqual(set(qdq_config.nodes_to_exclude), {"Mul"})
110108
self.assertEqual(set(qdq_config.op_types_to_quantize), {"Add"})
111109

0 commit comments

Comments
 (0)