Skip to content

Commit 41ce544

Browse files
dcamporaZhanruiSunChyiqingy0chzblychyuki-666
authored
chore: Mass integration of release/0.18 (#3421)
* [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <[email protected]> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <[email protected]> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <[email protected]> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <[email protected]> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <[email protected]> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <[email protected]> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <[email protected]> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <[email protected]> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <[email protected]> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <[email protected]> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <[email protected]> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <[email protected]> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <[email protected]> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <[email protected]> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <[email protected]> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <[email protected]> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <[email protected]> * Formatting. Signed-off-by: Daniel Campora <[email protected]> * Remove duplicated file. Signed-off-by: Daniel Campora <[email protected]> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <[email protected]> * Remove indent change. Signed-off-by: Daniel Campora <[email protected]> * Apply suggestions from code review Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> * Apply suggestions from code review Co-authored-by: Yanchao Lu <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <[email protected]> * Update dev images Co-authored-by: Zhanrui Sun <[email protected]> Signed-off-by: Yanchao Lu <[email protected]> * Remove duplicated import. Signed-off-by: Daniel Campora <[email protected]> * Fix custom op Signed-off-by: Yi Zhang <[email protected]> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <[email protected]> * Skip problematic case. Signed-off-by: Daniel Campora <[email protected]> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <[email protected]> --------- Signed-off-by: Daniel Campora <[email protected]> Signed-off-by: Daniel Cámpora <[email protected]> Signed-off-by: Yanchao Lu <[email protected]> Signed-off-by: Yi Zhang <[email protected]> Co-authored-by: Zhanrui Sun <[email protected]> Co-authored-by: Yiqing Yan <[email protected]> Co-authored-by: Yanchao Lu <[email protected]> Co-authored-by: Yuki Huang <[email protected]> Co-authored-by: Ruodi Lu <[email protected]> Co-authored-by: Emma Qiao <[email protected]> Co-authored-by: Thor Johnsen <[email protected]> Co-authored-by: Zhanrui Sun <[email protected]> Co-authored-by: Yi Zhang <[email protected]> Co-authored-by: Tao Li @ NVIDIA <[email protected]>
1 parent da47d5f commit 41ce544

23 files changed

+253
-201
lines changed

.devcontainer/docker-compose.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
version: "3.9"
22
services:
33
tensorrt_llm-dev:
4-
image: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.01-py3-x86_64-ubuntu24.04-trt10.8.0.43-skip-devel-202503131720-8877
4+
image: urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.03-py3-x86_64-ubuntu24.04-trt10.9.0.34-skip-devel-202504101610-3421
55

66
network_mode: host
77
ipc: host

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ TensorRT-LLM
77
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-LLM/)
88
[![python](https://img.shields.io/badge/python-3.12-green)](https://www.python.org/downloads/release/python-3123/)
99
[![python](https://img.shields.io/badge/python-3.10-green)](https://www.python.org/downloads/release/python-31012/)
10-
[![cuda](https://img.shields.io/badge/cuda-12.8.0-green)](https://developer.nvidia.com/cuda-downloads)
11-
[![trt](https://img.shields.io/badge/TRT-10.8.0-green)](https://developer.nvidia.com/tensorrt)
10+
[![cuda](https://img.shields.io/badge/cuda-12.8.1-green)](https://developer.nvidia.com/cuda-downloads)
11+
[![trt](https://img.shields.io/badge/TRT-10.9.0-green)](https://developer.nvidia.com/tensorrt)
1212
[![version](https://img.shields.io/badge/release-0.19.0rc-green)](./tensorrt_llm/version.py)
1313
[![license](https://img.shields.io/badge/license-Apache%202-blue)](./LICENSE)
1414

docker/Dockerfile.multi

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Multi-stage Dockerfile
22
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch
3-
ARG BASE_TAG=25.01-py3
3+
ARG BASE_TAG=25.03-py3
44
ARG DEVEL_IMAGE=devel
55

66
FROM ${BASE_IMAGE}:${BASE_TAG} AS base

docker/Makefile

+3-3
Original file line numberDiff line numberDiff line change
@@ -152,16 +152,16 @@ jenkins-aarch64_%: STAGE = devel
152152
jenkins-rockylinux8_%: IMAGE_WITH_TAG = $(shell grep 'LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = ' ../jenkins/L0_MergeRequest.groovy | grep -o '".*"' | tr -d '"')
153153
jenkins-rockylinux8_%: STAGE = devel
154154
jenkins-rockylinux8_%: BASE_IMAGE = nvidia/cuda
155-
jenkins-rockylinux8_%: BASE_TAG = 12.8.0-devel-rockylinux8
155+
jenkins-rockylinux8_%: BASE_TAG = 12.8.1-devel-rockylinux8
156156

157157
rockylinux8_%: STAGE = devel
158158
rockylinux8_%: BASE_IMAGE = nvidia/cuda
159-
rockylinux8_%: BASE_TAG = 12.8.0-devel-rockylinux8
159+
rockylinux8_%: BASE_TAG = 12.8.1-devel-rockylinux8
160160

161161
# For x86_64 and aarch64
162162
ubuntu22_%: STAGE = devel
163163
ubuntu22_%: BASE_IMAGE = nvidia/cuda
164-
ubuntu22_%: BASE_TAG = 12.8.0-devel-ubuntu22.04
164+
ubuntu22_%: BASE_TAG = 12.8.1-devel-ubuntu22.04
165165

166166
trtllm_%: STAGE = release
167167
trtllm_%: PUSH_TO_STAGING := 0

docker/common/install_cuda_toolkit.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ set -ex
55
# This script is used for reinstalling CUDA on Rocky Linux 8 with the run file.
66
# CUDA version is usually aligned with the latest NGC CUDA image tag.
77
# Only use when public CUDA image is not ready.
8-
CUDA_VER="12.8.0_570.86.10"
8+
CUDA_VER="12.8.1_570.124.06"
99
CUDA_VER_SHORT="${CUDA_VER%_*}"
1010

1111
NVCC_VERSION_OUTPUT=$(nvcc --version)

docker/common/install_pytorch.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ set -ex
44

55
# Use latest stable version from https://pypi.org/project/torch/#history
66
# and closest to the version specified in
7-
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html#rel-25-01
7+
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-03.html#rel-25-03
88
TORCH_VERSION="2.6.0"
99
SYSTEM_ID=$(grep -oP '(?<=^ID=).+' /etc/os-release | tr -d '"')
1010

docker/common/install_tensorrt.sh

+9-9
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22

33
set -ex
44

5-
TRT_VER="10.8.0.43"
5+
TRT_VER="10.9.0.34"
66
# Align with the pre-installed cuDNN / cuBLAS / NCCL versions from
7-
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html#rel-25-01
8-
CUDA_VER="12.8" # 12.8.0
7+
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-03.html#rel-25-03
8+
CUDA_VER="12.8" # 12.8.1
99
# Keep the installation for cuDNN if users want to install PyTorch with source codes.
1010
# PyTorch 2.x can compile with cuDNN v9.
11-
CUDNN_VER="9.7.0.66-1"
11+
CUDNN_VER="9.8.0.87-1"
1212
NCCL_VER="2.25.1-1+cuda12.8"
13-
CUBLAS_VER="12.8.3.14-1"
13+
CUBLAS_VER="12.8.4.1-1"
1414
# Align with the pre-installed CUDA / NVCC / NVRTC versions from
1515
# https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
16-
NVRTC_VER="12.8.61-1"
17-
CUDA_RUNTIME="12.8.57-1"
18-
CUDA_DRIVER_VERSION="570.86.10-1.el8"
16+
NVRTC_VER="12.8.93-1"
17+
CUDA_RUNTIME="12.8.90-1"
18+
CUDA_DRIVER_VERSION="570.124.06-1.el8"
1919

2020
for i in "$@"; do
2121
case $i in
@@ -116,7 +116,7 @@ install_tensorrt() {
116116
if [ -z "$ARCH" ];then ARCH=$(uname -m);fi
117117
if [ "$ARCH" = "arm64" ];then ARCH="aarch64";fi
118118
if [ "$ARCH" = "amd64" ];then ARCH="x86_64";fi
119-
RELEASE_URL_TRT="https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/tars/TensorRT-${TRT_VER}.Linux.${ARCH}-gnu.cuda-${TRT_CUDA_VERSION}.tar.gz"
119+
RELEASE_URL_TRT="https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.9.0/tars/TensorRT-${TRT_VER}.Linux.${ARCH}-gnu.cuda-${TRT_CUDA_VERSION}.tar.gz"
120120
fi
121121
wget --no-verbose ${RELEASE_URL_TRT} -O /tmp/TensorRT.tar
122122
tar -xf /tmp/TensorRT.tar -C /usr/local/

docs/source/overview.md

+4
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,10 @@ TensorRT-LLM consists of pre– and post-processing steps and multi-GPU multi-no
3333
TensorRT-LLM supports GPUs based on the NVIDIA Hopper, NVIDIA Ada Lovelace, and NVIDIA Ampere architectures.
3434
Certain limitations might apply. Refer to the {ref}`support-matrix` for more information.
3535

36+
### Native Windows Support
37+
38+
Windows platform support is deprecated as of v0.18.0. All Windows-related code and functionality will be completely removed in future releases.
39+
3640
## What Can You Do With TensorRT-LLM?
3741

3842
Let TensorRT-LLM accelerate inference performance on the latest LLMs on NVIDIA GPUs. Use TensorRT-LLM as an optimization backbone for LLM inference in NVIDIA NeMo, an end-to-end framework to build, customize, and deploy generative AI applications into production. NeMo provides complete containers, including TensorRT-LLM and NVIDIA Triton, for generative AI deployments.

docs/source/reference/support-matrix.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,9 @@ The following table shows the supported software for TensorRT-LLM.
112112
* -
113113
- Software Compatibility
114114
* - Container
115-
- [25.01](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
115+
- [25.03](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)
116116
* - TensorRT
117-
- [10.8](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
117+
- [10.9](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html)
118118
* - Precision
119119
-
120120
- Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4

docs/source/release-notes.md

+26
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,32 @@
55
All published functionality in the Release Notes has been fully tested and verified with known limitations documented. To share feedback about this release, access our [NVIDIA Developer Forum](https://forums.developer.nvidia.com/).
66

77

8+
## TensorRT-LLM Release 0.18.1
9+
10+
### Key Features and Enhancements
11+
- **The 0.18.x series of releases builds upon the 0.17.0 release, focusing exclusively on dependency updates without incorporating features from the previous 0.18.0.dev pre-releases. These features will be included in future stable releases**.
12+
13+
### Infrastructure Changes
14+
- The dependent `transformers` package version is updated to 4.48.3.
15+
16+
17+
## TensorRT-LLM Release 0.18.0
18+
19+
### Key Features and Enhancements
20+
- **Features that were previously available in the 0.18.0.dev pre-releases are not included in this release**.
21+
- [BREAKING CHANGE] Windows platform support is deprecated as of v0.18.0. All Windows-related code and functionality will be completely removed in future releases.
22+
23+
### Known Issues
24+
- The PyTorch workflow on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use the [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) for optimal support on SBSA platforms.
25+
26+
### Infrastructure Changes
27+
- The base Docker image for TensorRT-LLM is updated to `nvcr.io/nvidia/pytorch:25.03-py3`.
28+
- The base Docker image for TensorRT-LLM Backend is updated to `nvcr.io/nvidia/tritonserver:25.03-py3`.
29+
- The dependent TensorRT version is updated to 10.9.
30+
- The dependent CUDA version is updated to 12.8.1.
31+
- The dependent NVIDIA ModelOpt version is updated to 0.25 for Linux platform.
32+
33+
834
## TensorRT-LLM Release 0.17.0
935

1036
### Key Features and Enhancements

jenkins/L0_MergeRequest.groovy

+4-4
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,10 @@ UPLOAD_PATH = env.uploadPath ? env.uploadPath : "sw-tensorrt-generic/llm-artifac
2121
// Container configuration
2222
// available tags can be found in: https://urm.nvidia.com/artifactory/sw-tensorrt-docker/tensorrt-llm/
2323
// [base_image_name]-[arch]-[os](-[python_version])-[trt_version]-[torch_install_type]-[stage]-[date]-[mr_id]
24-
LLM_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.01-py3-x86_64-ubuntu24.04-trt10.8.0.43-skip-devel-202503131720-8877"
25-
LLM_SBSA_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.01-py3-aarch64-ubuntu24.04-trt10.8.0.43-skip-devel-202503131720-8877"
26-
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.0-devel-rocky8-x86_64-rocky8-py310-trt10.8.0.43-skip-devel-202503131720-8877"
27-
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.0-devel-rocky8-x86_64-rocky8-py312-trt10.8.0.43-skip-devel-202503131720-8877"
24+
LLM_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.03-py3-x86_64-ubuntu24.04-trt10.9.0.34-skip-devel-202504101610-3421"
25+
LLM_SBSA_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.03-py3-aarch64-ubuntu24.04-trt10.9.0.34-skip-devel-202504101610-3421"
26+
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.1-devel-rocky8-x86_64-rocky8-py310-trt10.9.0.34-skip-devel-202504101610-3421"
27+
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.1-devel-rocky8-x86_64-rocky8-py312-trt10.9.0.34-skip-devel-202504101610-3421"
2828

2929
LLM_ROCKYLINUX8_DOCKER_IMAGE = LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE
3030

jenkins/L0_Test.groovy

+3-3
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,11 @@ linuxPkgName = ( env.targetArch == AARCH64_TRIPLE ? "tensorrt-llm-sbsa-release-s
2929
// available tags can be found in: https://urm.nvidia.com/artifactory/sw-tensorrt-docker/tensorrt-llm/
3030
// [base_image_name]-[arch]-[os](-[python_version])-[trt_version]-[torch_install_type]-[stage]-[date]-[mr_id]
3131
LLM_DOCKER_IMAGE = env.dockerImage
32-
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.0-devel-rocky8-x86_64-rocky8-py310-trt10.8.0.43-skip-devel-202503131720-8877"
33-
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.0-devel-rocky8-x86_64-rocky8-py312-trt10.8.0.43-skip-devel-202503131720-8877"
32+
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.1-devel-rocky8-x86_64-rocky8-py310-trt10.9.0.34-skip-devel-202504101610-3421"
33+
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-12.8.1-devel-rocky8-x86_64-rocky8-py312-trt10.9.0.34-skip-devel-202504101610-3421"
3434

3535
// DLFW torch image
36-
DLFW_IMAGE = "nvcr.io/nvidia/pytorch:25.01-py3"
36+
DLFW_IMAGE = "nvcr.io/nvidia/pytorch:25.03-py3"
3737

3838
//Ubuntu base image
3939
UBUNTU_22_04_IMAGE = "urm.nvidia.com/docker/ubuntu:22.04"

jenkins/controlCCache.groovy

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
import java.lang.InterruptedException
33

4-
DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.01-py3-x86_64-ubuntu24.04-trt10.8.0.43-skip-devel-202503131720-8877"
4+
DOCKER_IMAGE = "urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.03-py3-x86_64-ubuntu24.04-trt10.9.0.34-skip-devel-202504101610-3421"
55

66
def createKubernetesPodConfig(image)
77
{

requirements.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ pandas
1919
h5py==3.12.1
2020
StrEnum
2121
sentencepiece>=0.1.99
22-
tensorrt~=10.8.0
23-
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-01.html#rel-25-01 uses 2.6.0a0.
24-
torch>=2.6.0a0,<=2.6.0
22+
tensorrt~=10.9.0
23+
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-03.html#rel-25-03 uses 2.7.0a0.
24+
torch>=2.6.0,<=2.7.0a0
2525
torchvision
2626
nvidia-modelopt[torch]~=0.27.0
2727
nvidia-nccl-cu12

0 commit comments

Comments
 (0)