You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Quant Tool] Introduce get_qdq_config() helper to get QDQ configurations (microsoft#22677)
### Description
Introduces the `get_qdq_config()` function to get a quantization
configuration for a full integer QDQ model. This function provides an
easier way of specifying commonly used options and sets convenient
defaults. Specifically:
- Instead of requiring the user to pass a dictionary of `extra_options`,
the new interface adds function parameters for common settings:
- All calibrator settings
- Whether activations/weights are symmetric
- Whether to keep or fuse relu/clip into Q
- Minimum real range for quantization
- Dictionary of tensor quantization overrides.
- Automatically scans the input floating-point model and fills out the
operator types to quantize. Otherwise, only a limited number of operator
types would be quantized by default.
- Detects if the input model uses external data. If so, ensures that the
generated QDQ model also uses external data.
- Detects if the model will use newly introduced quantization types
(int4/int16) with an older opset. If so, forces the use of the
`com.microsoft` domain for Q/DQ ops, which support all types.
- Automatically enables the "extra option" called
`ForceQuantizeNoInputCheck` to ensure data movement operators (e.g.,
Transpose) are always quantized.
- User can pass a function to indicate which nodes to exclude from
quantization.
- The user can still pass their own `extra_options` to override any of
the above if necessary.
```python
from onnxruntime.quantization import get_int_qdq_config, quantize # , ...
# Get QDQ configuration
qdq_config = get_int_qdq_config(
float_model,
data_reader,
calibrate_method=CalibrationMethod.Percentile,
calibrate_args={"percentile": 99.98}, # Converted to extra_options
activation_type=QuantType.QUInt8,
weight_type=QuantType.QInt8,
per_channel=True,
nodes_to_exclude=["Mul"], # Could also be a function. Ex: `lambda model, node: node.op_type == "Softmax"`
# Other options converted to extra_options:
min_real_range=0.0001,
keep_removable_activations=True,
activation_symmetric=True,
weight_symmetric=True,
)
# Quantize model
quantize(float_model_path, qdq_model_path, qdq_config)
```
### Motivation and Context
Need a version of `get_qnn_qdq_config()` that is not EP-specific.
0 commit comments