Skip to content

❓ [Question] Internal Error-given invalid tensor name #1844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DanielLevi6 opened this issue Apr 20, 2023 · 4 comments · Fixed by #1846
Closed

❓ [Question] Internal Error-given invalid tensor name #1844

DanielLevi6 opened this issue Apr 20, 2023 · 4 comments · Fixed by #1846
Assignees
Labels
question Further information is requested

Comments

@DanielLevi6
Copy link

DanielLevi6 commented Apr 20, 2023

❓ Question

I want to convert a torch model(from python) to a runtime model(in C++), using the torch.fx capabilities. That will allow me to accelerate a model that isn't fully supported by TensorRT.
I understand that this flow is experimental, so I used the examples which are given in this repository.

By using this example-
https://github.com/pytorch/TensorRT/blob/main/examples/fx/fx2trt_example_next.py

I got some internal errors while running this code part(and also while running inference after that, but the error messages are identical as before, so I guess it's related.)-
trt_mod = TRTModule(
name="my_module",
serialized_engine=engine_str,
input_binding_names=r.input_names,
output_binding_names=r.output_names,
target_device=Device(f"cuda:{torch.cuda.current_device()}"),
)

The error messages are-
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorShape given invalid tensor name: input_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorDataType given invalid tensor name: input_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorShape given invalid tensor name: output_0)
ERROR: [Torch-TensorRT] - 3: [engine.cpp::getProfileObliviousBindingIndex::1386] Error Code 3: Internal Error (getTensorDataType given invalid tensor name: output_0)
What can cause these errors?
I tried to find other way to define the model inputs and outputs(which will maybe affect the input and output names in some way, as hinted from the error messages), but I don't see other way in the examples.

What you have already tried

I have already tried the notebook I linked before, and on other flow I got in the torch forum-
https://discuss.pytorch.org/t/using-torchtrt-fx-backend-on-c/170639/6

The code for this flow is-
model_fx = model_fx.cuda()
inputs_fx = [i.cuda() for i in inputs_fx]
trt_fx_module_f16 = torch_tensorrt.compile(
model_fx,
ir="fx",
inputs=inputs_fx,
enabled_precisions={torch.float16},
use_experimental_fx_rt=True,
explicit_batch_dimension=True
)
torch.save(trt_fx_module_f16, "trt.pt")
reload_trt_mod = torch.load("trt.pt")
scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
scripted_fx_module.save("/tmp/scripted_fx_module.ts")
scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts") #This can also be loaded in C++

The error is the same, while running the torch.compile method, using the "use_fx_experimental_rt=True" flag

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0): 1.13.1
  • CPU Architecture: x86-64
  • OS (e.g., Linux): Ubuntu 20.04
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source): -
  • Are you using local sources or building from archives: I used the pre-built version of Torch-TensorRT 1.3.0 release
  • Python version: 3.8.10
  • CUDA version: 11.8
  • GPU models and configuration: NVIDIA T1000
  • Any other relevant information: -
@DanielLevi6 DanielLevi6 added the question Further information is requested label Apr 20, 2023
@gs-olive
Copy link
Collaborator

gs-olive commented Apr 20, 2023

Hello - I am able to reproduce the error on the current main branch, and will look further into it. It could be related to input/output name mishandling.

@narendasan - do you have any suggestions on this?

For additional context, it seems that the input/output binding names here are not standard. Specifically, they are x and output0, respectively, whereas it seems input_0 and output_0 are expected.

@DanielLevi6
Copy link
Author

DanielLevi6 commented Apr 24, 2023

@gs-olive
Thank you!!

Can you tell me if there are pre-built binaries that I can use?
Or, otherwise, when a new updated version will be released?

I'm trying to build from sources, but I can't do it for some reason.
I ran the- bazel build //:libtorchtrt --compilation_mode opt command, and it ended successfully after some path changes in the bazel files. I'm using the libtorch cxx11 abi version, and as written in the WORKSPACE file, I wrote the same path for the two fields of libtorch-
new_local_repository(
name = "libtorch",
path = "/home/ThirdParties/libtorch",
build_file = "third_party/libtorch/BUILD"
)
new_local_repository(
name = "libtorch_pre_cxx11_abi",
path = "/home/ThirdParties/libtorch",
build_file = "third_party/libtorch/BUILD"
)

But when I'm trying the next step for installing the torch-tensorrt python version-
cd py && python3 setup.py install
so there are some symbols that are not defined for some reason. for example-
ld.lld: error: undefined symbol: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::string const&)

I tried to change the libtorch version to the pre-cxx11-abi version, and the python torch-tensorrt was installed successfully, but other error appeared-
ImportError: /home/ThirdParties/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I guess that the problem is the conflict between the different ABI values, but I don't understand how to solve it

@gs-olive
Copy link
Collaborator

Currently, there are not pre-built binaries of this, but I can offer some tips for compilation from scratch. If you are solely installing the Python Torch-TensorRT (not C++), then python3 setup.py install would suffice, as it runs the required bazel commands for you. Some other things to try which could be contributing to the error you are seeing:

  1. Ensure that the Torch installation being used is in agreement with the Torch version Torch-TRT is being compiled against. Specifically, ensure the Torch installation reflected when you run import torch can be found at the path /home/ThirdParties/libtorch. If you installed with pip, then run pip show torch and copy the Location path to the "path" sections of the WORKSPACE. For example, the path returned by pip might be something like /lib/python3.8/site-packages, so I would set path="lib/python3.8/site-packages/torch"
  2. From the root of this repository, run export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/bazel-TensorRT/external/libtorch/lib, to set the LD_LIBRARY_PATH.

If neither of these work, another option is to build a Docker container with the latest main, by following the steps in the README:
https://github.com/pytorch/TensorRT/blob/main/docker/README.md#building-a-torch-tensorrt-container

@gs-olive gs-olive reopened this Apr 24, 2023
@DanielLevi6
Copy link
Author

@gs-olive
I cleaned up the environment and installed the library according to your response, and now it works as expected
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants