Skip to content

chore: cherry pick commits from main into release/2.3 #2769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
815751b
fix: FakeTensors appearing in `get_attr` calls (#2669)
gs-olive Mar 20, 2024
5930e96
feat: support adaptive_avg_pool1d dynamo converter (#2614)
zewenli98 Mar 21, 2024
e150913
fix: Add cmake missing source file ref for core_lowering.passes (#2672)
Arktische Mar 22, 2024
4bb60f3
Add support for `aten.pixel_unshuffle` dynamo converter (#2696)
HolyWu Apr 5, 2024
4314fbc
feat: support aten.atan2 converter (#2689)
chohk88 Apr 12, 2024
a9a6272
feat: support aten.index_select converter (#2710)
chohk88 Apr 12, 2024
f5b7b31
feat: support aten.isnan converter (#2711)
chohk88 Apr 12, 2024
264905d
feat: support adaptive avg pool 2d and 3d dynamo converters (#2632)
zewenli98 Apr 12, 2024
fc29af4
feat: support aten.expm1 converter (#2714)
chohk88 Apr 12, 2024
4abec39
fix: Add dependencies to Docker container for `apt` versioning TRT (#…
gs-olive Apr 12, 2024
25a3b28
fix: param bug in `test_binary_ops_aten` (#2733)
zewenli98 Apr 16, 2024
822e63c
aten::empty_like (#2654)
apbose Apr 16, 2024
dee74c4
empty_permute decomposition (#2698)
apbose Apr 17, 2024
77c4b96
Removing grid lowering (#2686)
apbose Apr 17, 2024
2f8937f
chore(deps): bump transformers from 4.33.2 to 4.36.0 in /tools/perf (…
dependabot[bot] Apr 19, 2024
6ea06d9
Fix upsample converter not properly registered (#2683)
HolyWu Apr 19, 2024
79f7f38
feat: TS Add converter support for aten::grid_sampler (#2717)
mfeliz-cruise Apr 19, 2024
164b352
chore: fix is_nan_test
peri044 Apr 26, 2024
bec91fb
Fix minor grammatical corrections (#2779)
aakashapoorv Apr 26, 2024
08f1636
fix: convert_module_to_trt_engine (#2728)
zewenli98 Apr 25, 2024
4fd5e11
chore: rebase
peri044 Apr 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@

> Ahead of Time (AOT) compiling for PyTorch JIT and FX

Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extension and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.

Resources:
- [Documentation](https://nvidia.github.io/Torch-TensorRT/)
- [FX path Documentation](https://github.com/pytorch/TensorRT/blob/master/docsrc/tutorials/getting_started_with_fx_path.rst)
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
- [Comprehensive Discussion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
- [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.

## NVIDIA NGC Container
Expand Down Expand Up @@ -44,7 +44,7 @@ If you would like to build outside a docker container, please follow the section
#include "torch_tensorrt/torch_tensorrt.h"

...
// Set input datatypes. Allowerd options torch::{kFloat, kHalf, kChar, kInt32, kBool}
// Set input datatypes. Allowed options torch::{kFloat, kHalf, kChar, kInt32, kBool}
// Size of input_dtypes should match number of inputs to the network.
// If input_dtypes is not set, default precision follows traditional PyT / TRT rules
auto input = torch_tensorrt::Input(dims, torch::kHalf);
Expand Down Expand Up @@ -305,7 +305,7 @@ Supported Python versions:

### In Torch-TensorRT?

Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. Its preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.
Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. It's preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.

### In my application?

Expand Down
31 changes: 31 additions & 0 deletions core/conversion/converters/impl/interpolate.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,37 @@ auto interpolate_registrations TORCHTRT_UNUSED =
resize_layer_size(ctx, n, in, out_shape, {}, nvinfer1::InterpolationMode::kLINEAR, align_corners);
}

return true;
}})
.pattern(
{"aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor",
[](ConversionCtx* ctx, const torch::jit::Node* n, args& args) -> bool {
auto in = args[0].ITensorOrFreeze(ctx);
auto grid = args[1].ITensorOrFreeze(ctx);
auto interpolation_mode = args[2].unwrapToInt();
auto padding_mode = args[3].unwrapToInt();
auto align_corners = args[4].unwrapToBool();

static const auto sample_map = std::map<int, nvinfer1::SampleMode>{
{0, nvinfer1::SampleMode::kFILL},
{1, nvinfer1::SampleMode::kCLAMP},
{2, nvinfer1::SampleMode::kREFLECT}};

static const auto interpolation_map = std::map<int, nvinfer1::InterpolationMode>{
{0, nvinfer1::InterpolationMode::kLINEAR},
{1, nvinfer1::InterpolationMode::kNEAREST},
{2, nvinfer1::InterpolationMode::kCUBIC}};

auto grid_sample_layer = ctx->net->addGridSample(*in, *grid);
TORCHTRT_CHECK(
grid_sample_layer, "Unable to create grid_sample layer from node: " << util::node_info(n));

grid_sample_layer->setAlignCorners(align_corners);
grid_sample_layer->setSampleMode(sample_map.at(padding_mode));
grid_sample_layer->setInterpolationMode(interpolation_map.at(interpolation_mode));

auto out_tensor = ctx->AssociateValueAndTensor(n->outputs()[0], grid_sample_layer->getOutput(0));
LOG_DEBUG("Output tensor shape: " << out_tensor->getDimensions());
return true;
}});

Expand Down
1 change: 1 addition & 0 deletions core/lowering/passes/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ target_sources(${lib_name}
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_rsqrt.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_std.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_var.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_scaled_dot_product_attention.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/view_to_reshape.cpp"
"${CMAKE_CURRENT_SOURCE_DIR}/rewrite_inputs_with_params.cpp"
)
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/
RUN add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
RUN apt-get update

RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.* libnvparsers-dev=${TENSORRT_VERSION}.*
RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.* libnvparsers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-plugin-dev=${TENSORRT_VERSION}.*

# Setup Bazel via Bazelisk
RUN wget -q https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-amd64 -O /usr/bin/bazel &&\
Expand Down
2 changes: 2 additions & 0 deletions docsrc/py_api/dynamo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ Functions

.. autofunction:: export

.. autofunction:: convert_module_to_trt_engine



Classes
Expand Down
16 changes: 12 additions & 4 deletions py/torch_tensorrt/_compile.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import collections.abc
import logging
from enum import Enum
from typing import Any, Callable, List, Optional, Sequence, Set
Expand Down Expand Up @@ -237,8 +238,6 @@ def compile(
return compiled_fx_module
elif target_ir == _IRType.dynamo:
# Prepare torch and torchtrt inputs
import collections.abc

from torch_tensorrt.dynamo.utils import prepare_inputs

if not isinstance(input_list, collections.abc.Sequence):
Expand Down Expand Up @@ -342,10 +341,19 @@ def convert_method_to_trt_engine(
"convert_method_to_trt_engine call is not supported for ir=fx"
)
elif target_ir == _IRType.dynamo:
# Prepare torch and torchtrt inputs
from torch_tensorrt.dynamo.utils import prepare_inputs

if not isinstance(inputs, collections.abc.Sequence):
inputs = [inputs]

# Export the module
torchtrt_inputs = prepare_inputs(inputs)
exp_program = torch_tensorrt.dynamo.trace(module, torchtrt_inputs, **kwargs)

return dynamo_convert_module_to_trt_engine( # type: ignore[no-any-return]
module,
exp_program,
inputs=inputs,
method_name=method_name,
enabled_precisions=enabled_precisions_set,
**kwargs,
)
Expand Down
50 changes: 21 additions & 29 deletions py/torch_tensorrt/dynamo/_compiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -422,8 +422,7 @@ def contains_metadata(gm: torch.fx.GraphModule) -> bool:


def convert_module_to_trt_engine(
module: torch.fx.GraphModule,
method_name: str = "forward",
exported_program: ExportedProgram,
inputs: Optional[Sequence[Input | torch.Tensor]] = None,
enabled_precisions: (
Set[torch.dtype | dtype] | Tuple[torch.dtype | dtype]
Expand Down Expand Up @@ -453,15 +452,15 @@ def convert_module_to_trt_engine(
calibrator: object = None,
allow_shape_tensors: bool = False,
) -> bytes:
"""Convert a GraphModule module method to a serialized TensorRT engine
"""Convert an ExportedProgram to a serialized TensorRT engine

Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings
Converts an ExportedProgram to a serialized TensorRT engine given a dictionary of conversion settings

Arguments:
module (torch.fx.GraphModule): Source module
exported_program (torch.export.ExportedProgram): Source module

Keyword Args:
inputs (List[Union(torch_tensorrt.Input, torch.Tensor)]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
inputs (Optional[Sequence[torch_tensorrt.Input | torch.Tensor]]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum
to select device type. ::

Expand All @@ -476,30 +475,11 @@ def convert_module_to_trt_engine(
), # Dynamic input shape for input #2
torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

method_name (str): Name of method to convert
input_signature Union(List, Tuple, torch_tensorrt.Input, torch.Tensor): A formatted collection of input specifications for the module. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type. **This API should be considered beta-level stable and may change in the future** ::

input_signature=([
torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
torch_tensorrt.Input(
min_shape=(1, 224, 224, 3),
opt_shape=(1, 512, 512, 3),
max_shape=(1, 1024, 1024, 3),
dtype=torch.int32
format=torch.channel_last
), # Dynamic input shape for input #2
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3

device (Union(torch_tensorrt.Device, torch.device, dict)): Target device for TensorRT engines to run on ::

device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)

enabled_precisions (Optional[Set[torch.dtype | _enums.dtype]]): The set of datatypes that TensorRT can use
debug (bool): Whether to print out verbose debugging information
workspace_size (int): Workspace TRT is allowed to use for the module (0 is default)
min_block_size (int): Minimum number of operators per TRT-Engine Block
torch_executed_ops (Sequence[str]): Sequence of operations to run in Torch, regardless of converter coverage
torch_executed_ops (Set[str]): Set of operations to run in Torch, regardless of converter coverage
pass_through_build_failures (bool): Whether to fail on TRT engine build errors (True) or not (False)
max_aux_streams (Optional[int]): Maximum number of allowed auxiliary TRT streams for each engine
version_compatible (bool): Provide version forward-compatibility for engine plan files
Expand Down Expand Up @@ -566,13 +546,25 @@ def convert_module_to_trt_engine(
"dla_global_dram_size": dla_global_dram_size,
}

# Decompose the exported program
exported_program = exported_program.run_decompositions(
get_decompositions(enable_experimental_decompositions)
)
gm = exported_program.module()
logger.debug("Input graph: " + str(gm.graph))

# Apply lowering on the graph module
torch_inputs = get_torch_inputs(input_list, device)
gm = apply_lowering_passes(gm, torch_inputs)
logger.debug("Lowered Input graph: " + str(gm.graph))

settings = CompilationSettings(**compilation_options)
logger.info("Compilation Settings: %s\n", settings)
try:
interpreter_result = interpret_module_to_result(module, input_list, settings)
interpreter_result = interpret_module_to_result(gm, input_list, settings)
except UnsupportedOperatorException:
logger.error(
f"Conversion of module {module} not currently fully supported or convertible!",
f"Conversion of module {gm} not currently fully supported or convertible!",
exc_info=True,
)
except Exception as e:
Expand Down
Loading
Loading