Note: Tensorflow Quantization development has transitioned to the TensorRT Model Optimizer. All developers are encouraged to use the TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression. While the Tensorflow Quantization code will remain available, it will no longer receive further development.
This TensorFlow 2.x Quantization toolkit quantizes (inserts Q/DQ nodes) TensorFlow 2.x Keras models for Quantization-Aware Training (QAT). We follow NVIDIA's QAT recipe, which leads to optimal model acceleration with TensorRT on NVIDIA GPUs and hardware accelerators.
- Implements NVIDIA Quantization recipe.
- Supports fully automated or manual insertion of Quantization and DeQuantization (QDQ) nodes in the TensorFlow 2.x model with minimal code.
- Can easily to add support for new layers.
- Quantization behavior can be set programmatically.
- Implements automatic tests for popular architecture blocks such as residual and inception.
- Offers utilities for TensorFlow 2.x to TensorRT conversion via ONNX.
- Includes example workflows.
Python >= 3.8
TensorFlow >= 2.8
tf2onnx >= 1.10.1
onnx-graphsurgeon
pytest
pytest-html
TensorRT (optional) >= 8.4 GA
Latest TensorFlow 2.x docker image from NGC is recommended.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ docker pull nvcr.io/nvidia/tensorflow:22.03-tf2-py3
$ docker run -it --runtime=nvidia --gpus all --net host -v ~/TensorRT/tools/tensorflow-quantization:/home/tensorflow-quantization nvcr.io/nvidia/tensorflow:22.03-tf2-py3 /bin/bash
After last command, you will be placed in /workspace
directory inside the running docker container whereas tensorflow-quantization
repo is mounted in /home
directory.
$ cd /home/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
$ cd ~/
$ git clone https://github.com/NVIDIA/TensorRT.git
$ cd TensorRT/tools/tensorflow-quantization
$ ./install.sh
$ cd tests
$ python3 -m pytest quantize_test.py -rP
If all tests pass, installation is successful.
TensorFlow 2.x Quantization toolkit user guide.
- Only Quantization Aware Training (QAT) is supported as a quantization method.
- Only Functional and Sequential Keras models are supported. Original Keras layers are wrapped into quantized layers using TensorFlow's clone_model method, which doesn't support subclassed models.
- Saving the quantized version of a few layers may not be supported in
TensorFlow < 2.8
:DepthwiseConv2D
support was added in TF 2.8.Conv2DTranspose
is not yet supported by TF (see the open bug here). However, there's a workaround if you do not need the TF2 SavedModel file and just the ONNX file:- Implement
Conv2DTransposeQuantizeWrapper
. See our user guide for more information on how to do that. - Convert the quantized Keras model to ONNX using our provided utility function
convert_keras_model_to_onnx
.
- Implement
- GTC 2022 talk
- Quantization Basics whitepaper