Skip to content

Adds documentation for default_n_bit quantization scheme. #1045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,2 +1,38 @@
This directory is modified based on default_8bit, which allows you to manually
change the number of bits of weight and activation in QAT.

Code example for quantization of a Keras float `model`:

```
# Imports.
import tensorflow_model_optimization as tfmot

from tensorflow_model_optimization.python.core.quantization.keras.quantize import quantize_annotate_model
from tensorflow_model_optimization.python.core.quantization.keras.quantize import quantize_apply

from tensorflow_model_optimization.python.core.quantization.keras.experimental.default_n_bit import default_n_bit_quantize_scheme


# TODO(user): define Keras float model.

# Specify quantization scheme with 4-bit weights and 8-bit activations.
qat_scheme_4w8a = default_n_bit_quantize_scheme.DefaultNBitQuantizeScheme(
num_bits_weight=4,
num_bits_activation=8,
)

# Prepare the model for quantized aware training.
with tfmot.quantization.keras.quantize_scope():
quantized_aware_model = quantize_apply(
quantize_annotate_model(model),
qat_scheme_4w8a,
)

# TODO(user): compile and train quantized_aware_model using standard Keras methods.
```

Recommended activation precision is 8-bit for TF Lite conversion.

Before TF 2.11.0 the TF Lite converted weight value is stored one per byte in the weight tensor, so a 4-bit weight using default_n_bit scheme will be integer [-7, 7] occupying a byte. With TF 2.11.0 and release candidate TF 2.12.0 weight packing for 4-bit weights is added for selected operators, so two 4-bit weights are packed per byte for the regular convolution operator in TF 2.11.0.

To improve task quality it may be necessary to specify higher weight precision for the first and last layers of the model such as 8-bit. This can be achieved using wrapper code per layer. A code example is shown in [kws_streaming](https://github.com/google-research/google-research/commit/c87bac8133e00dc4fe646c182072676146312e0f) framework in Google Research repository.