Skip to content

🐛 [Bug] Encountered bug when using Torch-TensorRT #1687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zshn25 opened this issue Feb 21, 2023 · 9 comments
Closed

🐛 [Bug] Encountered bug when using Torch-TensorRT #1687

zshn25 opened this issue Feb 21, 2023 · 9 comments
Assignees
Labels

Comments

@zshn25
Copy link
Contributor

zshn25 commented Feb 21, 2023

Compiling a Scripted model gives error at torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")

RuntimeError                              Traceback (most recent call last)
Cell In[14], line 8
      2 scripted_model = torch.jit.script(model.to("cuda"), input_image_pytorch.to("cuda"))
      4 # trt_encoder = torch_tensorrt.compile(torch.jit.script(encoder).to("cuda"), 
      5 #     inputs= [input_image_pytorch.to("cuda")],
      6 #     enabled_precisions= { torch.float32} # Run with FP16
      7 # )
----> 8 trt_model = torch_tensorrt.compile(scripted_model.to("cuda"), 
      9     inputs=  [input_image_pytorch.to("cuda")], # [torch_tensorrt.Input([1,3,480,768])], # [input_image_pytorch.detach().to("cuda")],
     10     enabled_precisions= { torch.float32} # Run with FP16
     11 # trt_decoder = torch_tensorrt.compile(torch.jit.script(depth_decoder).to("cuda"), 
     12 #     inputs= [features],
     13     # enabled_precisions= { torch.float32} # Run with FP16
     14 )

File ~/miniconda3/envs/inference/lib/python3.8/site-packages/torch_tensorrt/_compile.py:125, in compile(module, ir, inputs, enabled_precisions, **kwargs)
    120         logging.log(
    121             logging.Level.Info,
    122             "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript",
    123         )
    124         ts_mod = torch.jit.script(module)
--> 125     return torch_tensorrt.ts.compile(
    126         ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs
    127     )
    128 elif target_ir == _IRType.fx:
    129     if (
    130         torch.float16 in enabled_precisions
    131         or torch_tensorrt.dtype.half in enabled_precisions
    132     ):

File ~/miniconda3/envs/inference/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py:136, in compile(module, inputs, input_signature, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, capability, num_avg_timing_iters, workspace_size, dla_sram_size, dla_local_dram_size, dla_global_dram_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
    110     raise ValueError(
    111         f"require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: {torch_executed_ops}, torch_executed_modules: {torch_executed_modules}"
    112     )
    114 spec = {
    115     "inputs": inputs,
    116     "input_signature": input_signature,
   (...)
    133     },
    134 }
--> 136 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    137 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
    138 return compiled_module

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/user/miniconda3/envs/inference/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 21, in forward
        x = self.quant(x)
        x = self.encoder(x)
        x = self.decoder(x)
            ~~~~~~~~~~~~ <--- HERE
        x = self.dequant(x[0])
        return x
  File "/home/user/playground/networks/depth_decoder_for_conversion.py", line 116, in forward
    
            # upsample and horizontal connections
            x = torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            if self.use_skips and i > 0:
                x_concat = input_features[i - 1]
  File "/home/user/miniconda3/envs/inference/lib/python3.8/site-packages/torch/nn/functional.py", line 3922, in interpolate
        return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
    if input.dim() == 4 and mode == "nearest":
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if input.dim() == 5 and mode == "nearest":
        return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: Expected static_cast<int64_t>(scale_factors->size()) == spatial_dimensions to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

To Reproduce

Steps to reproduce the behavior:

import torch
import torch_tensorrt
input_image_pytorch = torch.randn((1, 3, 480, 768), requires_grad=False).to("cuda").detach()

scripted_model = torch.jit.script(model.to("cuda"), input_image_pytorch.to("cuda"))
trt_model = torch_tensorrt.compile(scripted_model.to("cuda"), 
    inputs=  [input_image_pytorch.to("cuda")], 
    enabled_precisions= { torch.float32}
)

Expected behavior

No error.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.3.0
  • PyTorch Version (e.g. 1.0): 1.13.1
  • CPU Architecture: x86
  • OS (e.g., Linux): Ubuntu 18.04 LTS
  • How you installed PyTorch (conda, pip, libtorch, source): conda
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.8
  • CUDA version: 11.7
  • GPU models and configuration:
  • Any other relevant information:

Additional context

The error occurs at torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest") line in the network definition. JIT Scripting works.

@gcuendet
Copy link
Contributor

gcuendet commented Apr 4, 2023

@narendasan @bowang007 any update on this? I am observing the same behaviour.
Can it be linked to this older issue? Maybe in solving that older one, it broke the support for specifying a single float as scale_factors (I mean at least at the interpolate level: torch.nn.functional.interpolate(x, scale_factor=2.0, mode="nearest"))? It's hard to understand exactly what happened with this old issue since there is no PR linked to it.

@gcuendet
Copy link
Contributor

gcuendet commented Apr 4, 2023

Actually I see some deprecation warnings when compiling Torch-TensorRT which seem to come from interpolate_plugin (amongst other files):

In file included from [...]/core/plugins/impl/interpolate_plugin.cpp:1:0:
[...]/core/plugins/impl/interpolate_plugin.h:127:107: warning: 'IPluginV2' is deprecated [-Wdeprecated-declarations]
    nvinfer1::IPluginV2* createPlugin(const char* name, const nvinfer1::PluginFieldCollection* fc) noexcept override;
                                                                                                            ^~~~~~~~
In file included from [...]/include/NvInferLegacyDims.h:16:0,
                 from [...]/include/NvInfer.h:16,
                 from [...]/core/plugins/impl/interpolate_plugin.h:12,
                 from [...]/core/plugins/impl/interpolate_plugin.cpp:1:
[...]/include/NvInferRuntimeCommon.h:393:22: note: declared here
  class TRT_DEPRECATED IPluginV2

Can that explain why interpolate is behaving in a weird way? Are these warning expected?
I am using TensorRT 8.5.3.1, which from my understanding is supposed to be the supported version for 1.3, right?

philippewarren added a commit to introlab/t-top that referenced this issue May 16, 2023
@github-actions
Copy link

github-actions bot commented Jul 4, 2023

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@philippewarren
Copy link

I'd like to see this issue fixed as it is preventing us from using the software for our use case currently.

@philippewarren
Copy link

@narendasan or @bowang007 is anyone working on this? If not, what could be done to help?

@bowang007
Copy link
Collaborator

Hi @philippewarren, I went through this several weeks ago but I don't think we have seen such issues previously.
Could you please provide a small reproducer? Also, a more detailed log would be helpful as well.
Thanks!

@philippewarren
Copy link

philippewarren commented Jul 24, 2023

This seems to be caused by a combination of an upscale and multiple return values for the forward function of the network.
Here is a minimal reproducible example:

import torch
import torch.nn as nn
import torch_tensorrt


class MRE(nn.Module):
    def __init__(self):
        super(MRE, self).__init__()

        self._upsample = nn.Upsample(scale_factor=2, mode='nearest')

    def forward(self, x):
        y = self._upsample(x)

        return [x, y]


model = MRE()
x = torch.ones((1, 3, 416, 416))

model.eval()

device = torch.device('cuda')
model = model.to(device)

trt_module = torch_tensorrt.compile(
    model,
    inputs=[x.to(device)],
    enabled_precisions={torch.float},
)

torch.jit.save(trt_module, "mre.trt.pth")

The same error happens if the list returned ([x, y]) is replaced by a tuple ((x, y)).

This is the associated output:

WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float32. The compiler is going to use the user setting Float32
Traceback (most recent call last):
  File "./torch-tensort-mre.py", line 31, in <module>
    trt_module = torch_tensorrt.compile(
  File "/home/philippe/.local/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 125, in compile
    return torch_tensorrt.ts.compile(
  File "/home/philippe/.local/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 136, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "./torch-tensort-mre.py", line 18, in forward
    def forward(self, x):
        y = self._upsample(x)
            ~~~~~~~~~~~~~~ <--- HERE
    
        return [x, y]
  File "/home/philippe/.local/lib/python3.8/site-packages/torch/nn/modules/upsampling.py", line 156, in forward
    def forward(self, input: Tensor) -> Tensor:
        return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
               ~~~~~~~~~~~~~ <--- HERE
                             recompute_scale_factor=self.recompute_scale_factor)
  File "/home/philippe/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 3922, in interpolate
        return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
    if input.dim() == 4 and mode == "nearest":
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if input.dim() == 5 and mode == "nearest":
        return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: Expected static_cast<int64_t>(scale_factors->size()) == spatial_dimensions to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)

torch.__version__ is 1.13.1+cu117
torch_tensorrt.__version__ is 1.3.0
CUDA version is 12.1 with driver version 530.30.02 on Ubuntu 20.04.
CUDNN version is 8.9.3.28-1+cuda12.1.

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@domef
Copy link

domef commented Nov 28, 2024

any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants