❓ [Question] Internal Error-given invalid tensor name #1844

DanielLevi6 · 2023-04-20T10:35:35Z

❓ Question

I want to convert a torch model(from python) to a runtime model(in C++), using the torch.fx capabilities. That will allow me to accelerate a model that isn't fully supported by TensorRT.
I understand that this flow is experimental, so I used the examples which are given in this repository.

By using this example-
https://github.com/pytorch/TensorRT/blob/main/examples/fx/fx2trt_example_next.py

I got some internal errors while running this code part(and also while running inference after that, but the error messages are identical as before, so I guess it's related.)-
trt_mod = TRTModule(
name="my_module",
serialized_engine=engine_str,
input_binding_names=r.input_names,
output_binding_names=r.output_names,
target_device=Device(f"cuda:{torch.cuda.current_device()}"),
)

The error messages are-
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorShape given invalid tensor name: input_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorDataType given invalid tensor name: input_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorShape given invalid tensor name: output_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorDataType given invalid tensor name: output_0)
What can cause these errors?
I tried to find other way to define the model inputs and outputs(which will maybe affect the input and output names in some way, as hinted from the error messages), but I don't see other way in the examples.

What you have already tried

I have already tried the notebook I linked before, and on other flow I got in the torch forum-
https://discuss.pytorch.org/t/using-torchtrt-fx-backend-on-c/170639/6

The code for this flow is-
model_fx = model_fx.cuda()
inputs_fx = [i.cuda() for i in inputs_fx]
trt_fx_module_f16 = torch_tensorrt.compile(
model_fx,
ir="fx",
inputs=inputs_fx,
enabled_precisions={torch.float16},
use_experimental_fx_rt=True,
explicit_batch_dimension=True
)
torch.save(trt_fx_module_f16, "trt.pt")
reload_trt_mod = torch.load("trt.pt")
scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
scripted_fx_module.save("/tmp/scripted_fx_module.ts")
scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts") #This can also be loaded in C++

The error is the same, while running the torch.compile method, using the "use_fx_experimental_rt=True" flag

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 1.13.1
CPU Architecture: x86-64
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source): -
Are you using local sources or building from archives: I used the pre-built version of Torch-TensorRT 1.3.0 release
Python version: 3.8.10
CUDA version: 11.8
GPU models and configuration: NVIDIA T1000
Any other relevant information: -

The text was updated successfully, but these errors were encountered:

gs-olive · 2023-04-20T16:47:57Z

Hello - I am able to reproduce the error on the current main branch, and will look further into it. It could be related to input/output name mishandling.

@narendasan - do you have any suggestions on this?

For additional context, it seems that the input/output binding names here are not standard. Specifically, they are x and output0, respectively, whereas it seems input_0 and output_0 are expected.

DanielLevi6 · 2023-04-24T14:41:32Z

@gs-olive
Thank you!!

Can you tell me if there are pre-built binaries that I can use?
Or, otherwise, when a new updated version will be released?

I'm trying to build from sources, but I can't do it for some reason.
I ran the- bazel build //:libtorchtrt --compilation_mode opt command, and it ended successfully after some path changes in the bazel files. I'm using the libtorch cxx11 abi version, and as written in the WORKSPACE file, I wrote the same path for the two fields of libtorch-
new_local_repository(
name = "libtorch",
path = "/home/ThirdParties/libtorch",
build_file = "third_party/libtorch/BUILD"
)
new_local_repository(
name = "libtorch_pre_cxx11_abi",
path = "/home/ThirdParties/libtorch",
build_file = "third_party/libtorch/BUILD"
)

But when I'm trying the next step for installing the torch-tensorrt python version-
cd py && python3 setup.py install
so there are some symbols that are not defined for some reason. for example-
ld.lld: error: undefined symbol: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)

I tried to change the libtorch version to the pre-cxx11-abi version, and the python torch-tensorrt was installed successfully, but other error appeared-
ImportError: /home/ThirdParties/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I guess that the problem is the conflict between the different ABI values, but I don't understand how to solve it

gs-olive · 2023-04-24T16:59:35Z

Currently, there are not pre-built binaries of this, but I can offer some tips for compilation from scratch. If you are solely installing the Python Torch-TensorRT (not C++), then python3 setup.py install would suffice, as it runs the required bazel commands for you. Some other things to try which could be contributing to the error you are seeing:

Ensure that the Torch installation being used is in agreement with the Torch version Torch-TRT is being compiled against. Specifically, ensure the Torch installation reflected when you run import torch can be found at the path /home/ThirdParties/libtorch. If you installed with pip, then run pip show torch and copy the Location path to the "path" sections of the WORKSPACE. For example, the path returned by pip might be something like /lib/python3.8/site-packages, so I would set path="lib/python3.8/site-packages/torch"
From the root of this repository, run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/bazel-TensorRT/external/libtorch/lib, to set the LD_LIBRARY_PATH.

If neither of these work, another option is to build a Docker container with the latest main, by following the steps in the README:
https://github.com/pytorch/TensorRT/blob/main/docker/README.md#building-a-torch-tensorrt-container

DanielLevi6 · 2023-04-27T13:17:29Z

@gs-olive
I cleaned up the environment and installed the library according to your response, and now it works as expected
Thank you

DanielLevi6 added the question Further information is requested label Apr 20, 2023

gs-olive assigned narendasan and gs-olive Apr 20, 2023

gs-olive mentioned this issue Apr 20, 2023

fix: Error caused by invalid binding name in TRTEngine.to_str() method #1846

Merged

gs-olive closed this as completed in #1846 Apr 20, 2023

gs-olive reopened this Apr 24, 2023

gs-olive closed this as completed Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ [Question] Internal Error-given invalid tensor name #1844

❓ [Question] Internal Error-given invalid tensor name #1844

DanielLevi6 commented Apr 20, 2023 •

edited

Loading

gs-olive commented Apr 20, 2023 •

edited

Loading

DanielLevi6 commented Apr 24, 2023 •

edited

Loading

gs-olive commented Apr 24, 2023

DanielLevi6 commented Apr 27, 2023

❓ [Question] Internal Error-given invalid tensor name #1844

❓ [Question] Internal Error-given invalid tensor name #1844

Comments

DanielLevi6 commented Apr 20, 2023 • edited Loading

❓ Question

What you have already tried

Environment

gs-olive commented Apr 20, 2023 • edited Loading

DanielLevi6 commented Apr 24, 2023 • edited Loading

gs-olive commented Apr 24, 2023

DanielLevi6 commented Apr 27, 2023

DanielLevi6 commented Apr 20, 2023 •

edited

Loading

gs-olive commented Apr 20, 2023 •

edited

Loading

DanielLevi6 commented Apr 24, 2023 •

edited

Loading