-
Notifications
You must be signed in to change notification settings - Fork 122
Failed to build TensorRT-LLM whisper Decoder #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm getting the same error for encoder. I'm trying to convert the Whisper large-v3-turbo model:
Error:
|
So, the problem is that the |
thanks for sharing the py files, it solves the problem. just wonder that do you pass the trtllm-build with the default arguments in the README.md? |
System Info
I was following this whisper-doc to run on triton Inference Server with TensorRT-LLM backend, getting following error after running following command while building TensorRT-LLM engines for Decoder but work fine for encoder.
System specs:
GPU specs:
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
building TensorRT-LLM engines for Decoder
Expected behavior
trtllm-build
command should return TensorRT-LLM model require which will be require during inferenceactual behavior
Faced following Error:
additional notes
I used
nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
and used this this script toconvert_checkpoints.py
The text was updated successfully, but these errors were encountered: