Skip to content

Commit 5ce797a

Browse files
centwangsnnn
authored andcommitted
Quantize Weight for Gemm/Conv on Quantized Model (#22969)
Some quantized models have QDQ around Conv/Gemm but the weight and/or bias are not quantized. This PR adds WeightBiasQuantization optimizer to quantize float weight and/or bias to INT8 and INT32 tensors respectively. We only do this for weight and/or bias initializer so that ConstantFolding will fold the sub-graph to real quantized initializers during the graph optimization next round.
1 parent 6f73744 commit 5ce797a

File tree

6 files changed

+404
-183
lines changed

6 files changed

+404
-183
lines changed

onnxruntime/core/optimizer/graph_transformer_utils.cc

+2-2
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
#ifdef MLAS_TARGET_AMD64_IX86
6464
#include "core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.h"
6565
#endif
66-
#include "core/optimizer/qdq_transformer/bias_quantization.h"
66+
#include "core/optimizer/qdq_transformer/weight_bias_quantization.h"
6767
#include "core/optimizer/qdq_transformer/clip_quantizelinear.h"
6868
#include "core/optimizer/qdq_transformer/ensure_unique_dq_for_node_unit.h"
6969
#include "core/optimizer/qdq_transformer/qdq_propagation.h"
@@ -245,7 +245,7 @@ InlinedVector<std::unique_ptr<GraphTransformer>> GenerateTransformers(
245245

246246
if (!disable_quant_qdq) {
247247
transformers.emplace_back(std::make_unique<QDQPropagationTransformer>());
248-
transformers.emplace_back(std::make_unique<BiasQuantization>());
248+
transformers.emplace_back(std::make_unique<WeightBiasQuantization>());
249249

250250
// EnsureUniqueDQForNodeUnit is actually a required graph transformation. The unique DQ per QDQ node unit input
251251
// condition that it ensures is important for the partitioning that happens after Level1 optimizers are run.

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.cc

-149
This file was deleted.

onnxruntime/core/optimizer/qdq_transformer/bias_quantization.h

-27
This file was deleted.

0 commit comments

Comments
 (0)