You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[QDQ Optimizer] Update WeightBiasQuantization to skip Conv/Gemm if downstream node is not QuantizeLinear (microsoft#24537)
### Description
Updates the WeightBiasQuantization optimizer to skip processing on
Conv/Gemm nodes if the downstream child node is not a QuantizeLinear.
#### Before this PR
Original graph:
```
input_0 -> DQ -> Conv -> graph_output (or non-Q node)
^ ^
| |
weights_f32------+
|
bias_f32------------+
```
Becomes:
```
input_0 -> DQ ------> Conv -> graph_output (or non-Q node)
^ ^
| |
weights_quant -> DQ --+
|
bias_quant -> DQ --------+
```
The above is **NOT** a valid QDQ node unit for Conv because the Conv's
output is not consumed by a QuantizeLinear node.
#### With this PR
The above example graph remains unchanged after L1 optimizations:
```
input_0 -> DQ -> Conv -> graph_output (or non-Q node)
^ ^
| |
weights_f32------+
|
bias_f32------------+
```
### Motivation and Context
Caused inaccuracy for a customer model. Automatically quantizing the
weights and biases of a Conv/Gemm is detrimental if the output of the
Conv/Gemm is not consumed by a QuantizeLinear node. In this scenario,
the whole node group is not considered a valid QDQ node unit, and so the
EP has to run the Conv/Gemm as float32/float16 anyway. If the Conv/Gemm
is running as float32/float16, then quantizing the weights and biases
introduces inaccuracy for no gain.
PR that originally added this optimizer:
microsoft#22969
0 commit comments