What is the purpose of all_models/inflight_batcher_llm/tensorrt_llm/1/model.py? #538
-
From what I can tell, that file is not used at all when running the inflight batcher with the tensorrtllm backend, as the code clearly specifies that it runs a decoder only model but the file is used anyways with the encoder decoder tutorial with no modification. That file aside, has anyone actually gotten that tutorial working? I have been trying to run a flan-t5 model using that tutorial for the past month and have so far been unsuccessful. I am able to build the engines without any issue, but when I turn on the server I get To fix this error, I have tried changing the input_lengths argument in both all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt and all_models/inflight_batcher_llm/ensemble/config.pbtxt to encoder_input_lengths but the issue persists. As a side note, I also found that to avoid an error that didn't let me send any prompt longer than 1 token I had to go against the tutorial and not pass "--max_input_len 1" when creating the decoder. I am using the 24.06 container with trtllm V.10 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@owenonline |
Beta Was this translation helpful? Give feedback.
-
What is the difference between these two ways? tensorrt_llm or python? |
Beta Was this translation helpful? Give feedback.
@owenonline
if you want to use all_models/inflight_batcher_llm/tensorrt_llm/1/model.py, you have to change backend in config.pbtxt from tensorrt_llm to python. then if you change model.py, it will effect on your program.