What is the purpose of all_models/inflight_batcher_llm/tensorrt_llm/1/model.py? #538

owenonline · 2024-07-21T17:25:24Z

owenonline
Jul 21, 2024

From what I can tell, that file is not used at all when running the inflight batcher with the tensorrtllm backend, as the code clearly specifies that it runs a decoder only model but the file is used anyways with the encoder decoder tutorial with no modification.

That file aside, has anyone actually gotten that tutorial working? I have been trying to run a flan-t5 model using that tutorial for the past month and have so far been unsuccessful. I am able to build the engines without any issue, but when I turn on the server I get
{"error":"in ensemble 'ensemble', Executor failed process requestId 1 due to the following error: Encountered an error in forwardAsync function: Input tensor 'encoder_input_lengths' not found; expected shape: (-1) (/tmp/tritonbuild/tensorrtllm/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:151)\n1 0x7f3dbc072b20 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 640\n2 0x7f3dbc2a8e11 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::setupContext(std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, std::vector<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&, int) + 385\n3 0x7f3dbc2a9090 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::executeBatch(tensorrt_llm::batch_manager::ScheduledRequests const&) + 544\n4 0x7f3dbc2b66a3 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forwardAsync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > const&) + 1731\n5 0x7f3dbc2d9cdb tensorrt_llm::executor::Executor::Impl::forwardAsync(std::__cxx11::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 91\n6 0x7f3dbc2dc9eb tensorrt_llm::executor::Executor::Impl::executionLoop() + 475\n7 0x7f3eb0dd8253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f3eb0dd8253]\n8 0x7f3eb0b67ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f3eb0b67ac3]\n9 0x7f3eb0bf8a04 clone + 68"}

To fix this error, I have tried changing the input_lengths argument in both all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt and all_models/inflight_batcher_llm/ensemble/config.pbtxt to encoder_input_lengths but the issue persists. As a side note, I also found that to avoid an error that didn't let me send any prompt longer than 1 token I had to go against the tutorial and not pass "--max_input_len 1" when creating the decoder. I am using the 24.06 container with trtllm V.10

Answered by Alireza3242

Aug 13, 2024

@owenonline
if you want to use all_models/inflight_batcher_llm/tensorrt_llm/1/model.py, you have to change backend in config.pbtxt from tensorrt_llm to python. then if you change model.py, it will effect on your program.

View full answer

Alireza3242 · 2024-08-13T12:58:52Z

Alireza3242
Aug 13, 2024

@owenonline
if you want to use all_models/inflight_batcher_llm/tensorrt_llm/1/model.py, you have to change backend in config.pbtxt from tensorrt_llm to python. then if you change model.py, it will effect on your program.

1 reply

dengxiufeng Oct 23, 2024

could you launch triton after change the backend to python, I always met mpi run error after that.

Oldpan · 2024-09-23T02:47:06Z

Oldpan
Sep 23, 2024

What is the difference between these two ways? tensorrt_llm or python?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the purpose of all_models/inflight_batcher_llm/tensorrt_llm/1/model.py? #538

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

What is the purpose of all_models/inflight_batcher_llm/tensorrt_llm/1/model.py? #538

owenonline Jul 21, 2024

Replies: 2 comments · 1 reply

Alireza3242 Aug 13, 2024

dengxiufeng Oct 23, 2024

Oldpan Sep 23, 2024

owenonline
Jul 21, 2024

Replies: 2 comments 1 reply

Alireza3242
Aug 13, 2024

Oldpan
Sep 23, 2024