Skip to content

Commit 147005f

Browse files
asfiyab-nvidiarajeevsrao
authored andcommitted
TensorRT 10.0 Release
Signed-off-by: Asfiya Baig <[email protected]>
1 parent 3d97932 commit 147005f

File tree

941 files changed

+50512
-37626
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

941 files changed

+50512
-37626
lines changed

CHANGELOG.md

+129
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,134 @@
11
# TensorRT OSS Release Changelog
22

3+
## 10.0.0 EA - 2024-04-02
4+
5+
Key Features and Updates:
6+
7+
- Samples changes
8+
- Added a [sample](samples/python/sample_weight_stripping) showcasing weight-stripped engines.
9+
- Added a [sample](samples/python/python_plugin/circ_pad_plugin_multi_tactic.py) demonstrating the use of custom tactics with IPluginV3.
10+
- Added a [sample](samples/sampleNonZeroPlugin) to showcase plugins with data-dependent output shapes, using IPluginV3.
11+
- Parser changes
12+
- Added a new class `IParserRefitter` that can be used to refit a TensorRT engine with the weights of an ONNX model.
13+
- `kNATIVE_INSTANCENORM` is now set to ON by default.
14+
- Added support for `IPluginV3` interfaces from TensorRT.
15+
- Added support for `INT4` quantization.
16+
- Added support for the `reduction` attribute in `ScatterElements`.
17+
- Added support for `wrap` padding mode in `Pad`
18+
- Plugin changes
19+
- A [new plugin](plugin/scatterElementsPlugin) has been added in compliance with [ONNX ScatterElements](https://github.com/onnx/onnx/blob/main/docs/Operators.md#ScatterElements).
20+
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
21+
- All plugins which relied on cuBLAS/cuDNN handles passed through `IPluginV2Ext::attachToContext()` have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
22+
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
23+
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
24+
- disentangledAttentionPlugin: Fixed a kernel bug.
25+
- Demo changes
26+
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/).
27+
- Updated tooling
28+
- Polygraphy v0.49.9
29+
- ONNX-GraphSurgeon v0.5.1
30+
- TensorRT Engine Explorer v0.1.8
31+
- Build Containers
32+
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.
33+
34+
## 9.3.0 GA - 2024-02-09
35+
36+
Key Features and Updates:
37+
38+
- Demo changes
39+
- Faster Text-to-image using SDXL & INT8 quantization using AMMO
40+
- Updated tooling
41+
- Polygraphy v0.49.7
42+
43+
## 9.2.0 GA - 2023-11-27
44+
45+
Key Features and Updates:
46+
47+
- `trtexec` enhancement: Added `--weightless` flag to mark the engine as weightless.
48+
- Parser changes
49+
- Added support for Hardmax operator.
50+
- Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
51+
- Plugin changes
52+
- Explicit INT8 support added to `bertQKVToContextPlugin`.
53+
- Various bug fixes.
54+
- Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.
55+
56+
57+
## 9.1.0 GA - 2023-10-18
58+
59+
Key Features and Updates:
60+
61+
- Update the [trt_python_plugin](samples/python/python_plugin) sample.
62+
- Python plugins API reference is part of the offical TRT Python API.
63+
- Added samples demonstrating the usage of the progress monitor API.
64+
- Check [sampleProgressMonitor](samples/sampleProgressMonitor) for the C++ sample.
65+
- Check [simple_progress_monitor](samples/python/simple_progress_monitor) for the Python sample.
66+
- Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
67+
- Demo changes
68+
- Added LAMBADA dataset accuracy checks in the [HuggingFace](demo/HuggingFace) demo.
69+
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the [NeMo](demo/NeMo) demo.
70+
- Replaced deprecated APIs in the [BERT](demo/BERT) demo.
71+
- Updated tooling
72+
- Polygraphy v0.49.1
73+
74+
75+
## 9.0.1 GA - 2023-09-07
76+
77+
Key Features and Updates:
78+
79+
- TensorRT plugin autorhing in Python is now supported
80+
- See the [trt_python_plugin](samples/python/python_plugin) sample for reference.
81+
- Updated default CUDA version to 12.2
82+
- Support for BLIP models, Seq2Seq and Vision2Seq abstractions in HuggingFace demo.
83+
- demoDiffusion refactoring and SDXL enhancements
84+
- Additional validation asserts for NV Plugins
85+
- Updated tooling
86+
- TensorRT Engine Explorer v0.1.7: graph rendering for TensorRT 9.0 `kgen` kernels
87+
- ONNX-GraphSurgeon v0.3.29
88+
- PyTorch quantization toolkit v2.2.0
89+
90+
91+
## 9.0.0 EA - 2023-08-06
92+
93+
Key Features and Updates:
94+
95+
- Added the NeMo demo to demonstrate the performance benefit of using E4M3 FP8 data type with the GPT models trained with the [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) and [TransformerEngine](https://github.com/NVIDIA/TransformerEngine).
96+
- Demo Diffusion updates
97+
- Added SDXL 1.0 txt2img pipeline
98+
- Added ControlNet pipeline
99+
- Huggingface demo updates
100+
- Added Flan-T5, OPT, BLOOM, BLOOMZ, GPT-Neo, GPT-NeoX, Cerebras-GPT support with accuracy check
101+
- Refactored code and extracted common utils into Seq2Seq class
102+
- Optimized shape-changing overhead and achieved a >30% e2e performance gain
103+
- Added stable KV-cache, beam search and fp16 support for all models
104+
- Added dynamic batch size TRT inference
105+
- Added uneven-length multi-batch inference with attention_mask support
106+
- Added `chat` command – interactive CLI
107+
- Upgraded PyTorch and HuggingFace version to support Hopper GPU
108+
- Updated notebooks with much simplified demo API.
109+
110+
- Added two new TensorRT samples: sampleProgressMonitor (C++) and simple_progress_reporter (Python) that are examples for using Progress Monitor during engine build.
111+
- The following plugins were deprecated:
112+
- ``BatchedNMS_TRT``
113+
- ``BatchedNMSDynamic_TRT``
114+
- ``BatchTilePlugin_TRT``
115+
- ``Clip_TRT``
116+
- ``CoordConvAC``
117+
- ``CropAndResize``
118+
- ``EfficientNMS_ONNX_TRT``
119+
- ``CustomGeluPluginDynamic``
120+
- ``LReLU_TRT``
121+
- ``NMSDynamic_TRT``
122+
- ``NMS_TRT``
123+
- ``Normalize_TRT``
124+
- ``Proposal``
125+
- ``SingleStepLSTMPlugin``
126+
- ``SpecialSlice_TRT``
127+
- ``Split``
128+
129+
- Ubuntu 18.04 has reached end of life and is no longer supported by TensorRT starting with 9.0, and the corresponding Dockerfile(s) have been removed.
130+
- Support for aarch64 builds will not be available in this release, and the corresponding Dockerfiles have been removed.
131+
3132
## [8.6.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) - 2023-05-02
4133

5134
TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.

CMakeLists.txt

+32-28
Original file line numberDiff line numberDiff line change
@@ -22,21 +22,41 @@ include(cmake/modules/find_library_create_target.cmake)
2222
set_ifndef(TRT_LIB_DIR ${CMAKE_BINARY_DIR})
2323
set_ifndef(TRT_OUT_DIR ${CMAKE_BINARY_DIR})
2424

25+
# Converts Windows paths
26+
if(CMAKE_VERSION VERSION_LESS 3.20)
27+
file(TO_CMAKE_PATH "${TRT_LIB_DIR}" TRT_LIB_DIR)
28+
file(TO_CMAKE_PATH "${TRT_OUT_DIR}" TRT_OUT_DIR)
29+
else()
30+
cmake_path(SET TRT_LIB_DIR ${TRT_LIB_DIR})
31+
cmake_path(SET TRT_OUT_DIR ${TRT_OUT_DIR})
32+
endif()
33+
34+
# Required to export symbols to build *.libs
35+
if(WIN32)
36+
add_compile_definitions(TENSORRT_BUILD_LIB 1)
37+
endif()
38+
39+
# Set output paths
40+
set(RUNTIME_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for runtime target files")
41+
set(LIBRARY_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for library target files")
42+
set(ARCHIVE_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for archive target files")
43+
44+
if(WIN32)
45+
set(STATIC_LIB_EXT "lib")
46+
else()
47+
set(STATIC_LIB_EXT "a")
48+
endif()
49+
2550
file(STRINGS "${CMAKE_CURRENT_SOURCE_DIR}/include/NvInferVersion.h" VERSION_STRINGS REGEX "#define NV_TENSORRT_.*")
2651

2752
foreach(TYPE MAJOR MINOR PATCH BUILD)
28-
string(REGEX MATCH "NV_TENSORRT_${TYPE} [0-9]" TRT_TYPE_STRING ${VERSION_STRINGS})
29-
string(REGEX MATCH "[0-9]" TRT_${TYPE} ${TRT_TYPE_STRING})
30-
endforeach(TYPE)
31-
32-
foreach(TYPE MAJOR MINOR PATCH)
33-
string(REGEX MATCH "NV_TENSORRT_SONAME_${TYPE} [0-9]" TRT_TYPE_STRING ${VERSION_STRINGS})
34-
string(REGEX MATCH "[0-9]" TRT_SO_${TYPE} ${TRT_TYPE_STRING})
53+
string(REGEX MATCH "NV_TENSORRT_${TYPE} [0-9]+" TRT_TYPE_STRING ${VERSION_STRINGS})
54+
string(REGEX MATCH "[0-9]+" TRT_${TYPE} ${TRT_TYPE_STRING})
3555
endforeach(TYPE)
3656

3757
set(TRT_VERSION "${TRT_MAJOR}.${TRT_MINOR}.${TRT_PATCH}" CACHE STRING "TensorRT project version")
3858
set(ONNX2TRT_VERSION "${TRT_MAJOR}.${TRT_MINOR}.${TRT_PATCH}" CACHE STRING "ONNX2TRT project version")
39-
set(TRT_SOVERSION "${TRT_SO_MAJOR}" CACHE STRING "TensorRT library so version")
59+
set(TRT_SOVERSION "${TRT_MAJOR}" CACHE STRING "TensorRT library so version")
4060
message("Building for TensorRT version: ${TRT_VERSION}, library version: ${TRT_SOVERSION}")
4161

4262
if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
@@ -88,8 +108,8 @@ endif()
88108
############################################################################################
89109
# Dependencies
90110

91-
set(DEFAULT_CUDA_VERSION 12.0.1)
92-
set(DEFAULT_CUDNN_VERSION 8.8)
111+
set(DEFAULT_CUDA_VERSION 12.2.0)
112+
set(DEFAULT_CUDNN_VERSION 8.9)
93113
set(DEFAULT_PROTOBUF_VERSION 3.20.1)
94114

95115
# Dependency Version Resolution
@@ -118,20 +138,12 @@ endif()
118138

119139
include_directories(
120140
${CUDA_INCLUDE_DIRS}
121-
${CUDNN_ROOT_DIR}/include
122141
)
123-
find_library(CUDNN_LIB cudnn HINTS
124-
${CUDA_TOOLKIT_ROOT_DIR} ${CUDNN_ROOT_DIR} PATH_SUFFIXES lib64 lib/x64 lib)
125-
find_library(CUBLAS_LIB cublas HINTS
126-
${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib64 lib lib/x64 lib/stubs)
127-
find_library(CUBLASLT_LIB cublasLt HINTS
128-
${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib64 lib lib/x64 lib/stubs)
129142
if(BUILD_PARSERS)
130143
configure_protobuf(${PROTOBUF_VERSION})
131144
endif()
132145

133146
find_library_create_target(nvinfer nvinfer SHARED ${TRT_LIB_DIR})
134-
find_library_create_target(nvuffparser nvparsers SHARED ${TRT_LIB_DIR})
135147

136148
find_library(CUDART_LIB cudart_static HINTS ${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib lib/x64 lib64)
137149

@@ -149,18 +161,11 @@ if (DEFINED GPU_ARCHS)
149161
separate_arguments(GPU_ARCHS)
150162
else()
151163
list(APPEND GPU_ARCHS
152-
53
153-
60
154-
61
155164
70
156165
75
157166
)
158167

159168
string(REGEX MATCH "aarch64" IS_ARM "${TRT_PLATFORM_ID}")
160-
if (IS_ARM)
161-
# Xavier (SM72) only supported for aarch64.
162-
list(APPEND GPU_ARCHS 72)
163-
endif()
164169

165170
if (CUDA_VERSION VERSION_GREATER_EQUAL 11.0)
166171
# Ampere GPU (SM80) support is only available in CUDA versions > 11.0
@@ -189,10 +194,10 @@ if (${LATEST_SM} GREATER_EQUAL 70)
189194
endif()
190195

191196
if(NOT MSVC)
192-
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler -Wno-deprecated-declarations")
197+
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr -Xcompiler -Wno-deprecated-declarations")
193198
else()
194199
set(CMAKE_CUDA_SEPARABLE_COMPILATION ON)
195-
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler")
200+
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr -Xcompiler")
196201
endif()
197202

198203
############################################################################################
@@ -207,7 +212,6 @@ endif()
207212
if(BUILD_PARSERS)
208213
add_subdirectory(parsers)
209214
else()
210-
find_library_create_target(nvcaffeparser nvparsers SHARED ${TRT_OUT_DIR} ${TRT_LIB_DIR})
211215
find_library_create_target(nvonnxparser nvonnxparser SHARED ${TRT_OUT_DIR} ${TRT_LIB_DIR})
212216
endif()
213217

0 commit comments

Comments
 (0)