Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QAT model saving bug : KeyError: '__inference_depthwise_conv2d_layer_call_fn_126 #868

Open
peri044 opened this issue Oct 22, 2021 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@peri044
Copy link

peri044 commented Oct 22, 2021

Describe the bug
Please download the scripts to reproduce from : https://drive.google.com/drive/folders/15cajAZ9sAZ2Uyix8sDVSYku6QCqDCec7?usp=sharing

Command to run : python sample_qat.py.

I have a simple model with input layer and a depthwise conv2d layer. I quantize this model by adding quantize_and_dequantize nodes at the input of depthwiseconv2d layer (commented in the code). When I save the model and load it back, I see the following

  File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 544, in <lambda>
    "function": lambda: self._recreate_function(proto.function),
  File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/load.py", line 586, in _recreate_function
    proto, self._concrete_functions), setattr
  File "/home/dperi/Downloads/py3/lib/python3.6/site-packages/tensorflow/python/saved_model/function_deserialization.py", line 295, in recreate_function
    concrete_function_objects.append(concrete_functions[concrete_function_name])
KeyError: '__inference_depthwise_conv2d_layer_call_and_return_conditional_losses_117'

System information

TensorFlow version (installed from source or binary): 2.5 (Tried with 2.6 as well)

TensorFlow Model Optimization version (installed from source or binary):

Saved model loading fails especially for Depthwise convolution. It works fine for regular conv.

@peri044 peri044 added the bug Something isn't working label Oct 22, 2021
@Jia-HongHenryLee
Copy link

Jia-HongHenryLee commented Oct 25, 2021

Hi @Xhark ,
I also have the same bug when I want to quantize Mobilenet v2.

System information

TensorFlow version (installed from binary): 2.5.0 => TensorFlow Model Optimization version (installed from binary): 0.6.0

TensorFlow version (installed from binary): 2.5.1 => TensorFlow Model Optimization version (installed from binary): 0.7.0

TensorFlow version (installed from binary): 2.4.0 => TensorFlow Model Optimization version (installed from binary): 0.7.0

Python version: 3.8.12

@Jia-HongHenryLee
Copy link

Hi @Xhark and @peri044 ,

I use the following environment to solve my problem.
System information
TensorFlow version (installed from binary): tf-nightly-gpu 2.5.0.dev20201202 (https://www.cnpython.com/pypi/tf-nightly-gpu/download)
TensorFlow Model Optimization version (installed from binary): 0.6.0
Python version: 3.8.12

@daverim
Copy link
Collaborator

daverim commented Nov 1, 2021

Hi peri044@ and Jia-HongHenryLee@

I'm looking into it now, but there are a couple of workarounds.
First, it seems to save correctly if you use

model.save('export_dir', save_format='h5')

I think this is caused by incorrect shape handling for the depthwise kernel quantization parameters, which results in functions not being traced/merged correctly.

Thanks for reporting this.

@peri044
Copy link
Author

peri044 commented Nov 7, 2021

Thank you @daverim for addressing this.
Can you let me know when this would be resolved or if there's an active PR for this ?
I haven't tried h5 format, since I'm using saved model format to pass it through TF2ONNX (with custom utilities) for processing.

@peri044
Copy link
Author

peri044 commented Nov 15, 2021

Hello @daverim, can you please suggest some pointers for me on how to fix this locally (using saved_model format)? Which files/functions to look at ? Thanks !!

@ChanZou
Copy link

ChanZou commented Nov 15, 2021

Hey @peri044. If your ultimate goal is to convert the model into TFLite format you can pass ConcreteFunction around. from_concrete_functions of TFLiteConverter works just fine for me.

@peri044
Copy link
Author

peri044 commented Nov 15, 2021

Hello @ChanZou My ultimate goal is to use the saved_model format (if it works) and pass it through TF2ONNX to convert it into ONNX graph. TF2ONNX accepts saved_model format for graphs currently.

@peri044
Copy link
Author

peri044 commented Jan 6, 2022

Thank you @daverim for addressing this. Can you let me know when this would be resolved or if there's an active PR for this ? I haven't tried h5 format, since I'm using saved model format to pass it through TF2ONNX (with custom utilities) for processing.

Hello @daverim, any suggestions on how to resolve this would be appreciated. Thanks !!

@daverim
Copy link
Collaborator

daverim commented Jan 6, 2022

Hi sorry for the delay.

I just tested your sample code and it seems to be resolved now. There are some warnings about un-traced functions.

Using: tf=2.8.0-dev20210930, tfmot=tensorflow_model_optimization=0.7.0

Please try and see if it works for you.
Thanks,
David

@peri044
Copy link
Author

peri044 commented Jan 26, 2022

Thanks @daverim. That works now.

@gcunhase
Copy link

@daverim I encountered the same error log for SeparableConv2D using TF 2.8.0 (no error with DepthwiseConv2D in that TF version):

...
Traceback (most recent call last):
  File "/home/PycharmProjects/tensorrt_qat/examples/mobilenet/run_qat_workflow.py", line 156, in <module>
    main(verbose=True)
  File "/home/PycharmProjects/tensorrt_qat/examples/mobilenet/run_qat_workflow.py", line 142, in main
    tf.keras.models.save_model(q_model, os.path.join(qat_save_finetuned_weights, "saved_model"))
  File "/home/PycharmProjects/tensorrt_qat/venv38_tf2.8_newPR/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/PycharmProjects/tensorrt_qat/venv38_tf2.8_newPR/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 403, in map_resources
    raise ValueError(
ValueError: Unable to save function b'__inference_block2_sepconv1_layer_call_fn_670910' because it captures graph tensor Tensor("xception/quant_block2_sepconv1/LastValueQuant_1/QuantizeAndDequantizeV4:0", shape=(3, 3, 64, 1), dtype=float32) from a parent function which cannot be converted to a constant with `tf.get_static_value`.

Do you have any idea what caused the error in DepthwiseConv2D and if the same fix would work for SeparableConv2D?
Thank you!

@k-w-w
Copy link
Contributor

k-w-w commented May 18, 2022

The best way to avoid this issue is to disable the layer tracing when creating the SavedModel, but you'll have to manually define the serving_default function (this is the default name that is used in TF2ONNX).

@tf.function
def predict(*args, **kwargs):
  return model(*args, **kwargs)

arg_spec, kwarg_spec = model.save_spec()
model.save(path, save_traces=False, signatures={
  "serving_default": predict.get_concrete_function(*arg_spec, **kwarg_spec)
})

@gcunhase
Copy link

Hi @k-w-w thank you for your feedback! This specific issue (for DepthwiseConv) has been solved, as mentioned in a comment on Jan 26th above, but the same issue persists for SeparableConv here.

I tried your suggestion, but it did not solve my issue, since the problem is not with tf2onnx, but with saving the TF model. Do you have any additional suggestions please?
Thank you!

@k-w-w
Copy link
Contributor

k-w-w commented May 19, 2022

@gcunhase Are you getting the same error even with save_traces=False?

@gcunhase
Copy link

@k-w-w yes

@k-w-w
Copy link
Contributor

k-w-w commented May 19, 2022

@gcunhase can you paste the error trace?

@gcunhase
Copy link

@k-w-w :

...
Traceback (most recent call last):
  File "/home/nvidia/PycharmProjects/nvbugs/internal_filed/tf_key_inference_bug/TF_bug_separableconv2d/sample.py", line 24, in <module>
    model.save(model_save_path)
  File "/home/nvidia/PycharmProjects/nvbugs/venv38_trt_regression/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/nvidia/PycharmProjects/nvbugs/venv38_trt_regression/lib/python3.8/site-packages/tensorflow/python/saved_model/save.py", line 403, in map_resources
    raise ValueError(
ValueError: Unable to save function b'__inference_separable_conv2d_layer_call_fn_961' because it captures graph tensor Tensor("model/quant_separable_conv2d/LastValueQuant_1/QuantizeAndDequantizeV4:0", shape=(3, 3, 3, 1), dtype=float32) from a parent function which cannot be converted to a constant with `tf.get_static_value`.

@gcunhase
Copy link

This bug also has the reproducible code, so we can move our discussion there if you agree.

@gcunhase
Copy link

This bug can be closed for DepthwiseConv2D.
For Conv2DTranspose and SeparableConv2D, please move the discussion here.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants