pytorch · peri044 · Apr 29, 2024 · Mar 20, 2024 · Mar 21, 2024 · Mar 22, 2024
diff --git a/README.md b/README.md
@@ -5,13 +5,13 @@
 
 > Ahead of Time (AOT) compiling for PyTorch JIT and FX
 
-Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
+Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extension and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
 
 Resources:
 - [Documentation](https://nvidia.github.io/Torch-TensorRT/)
 - [FX path Documentation](https://github.com/pytorch/TensorRT/blob/master/docsrc/tutorials/getting_started_with_fx_path.rst)
 - [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
-- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
+- [Comprehensive Discussion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
 - [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.
 
 ## NVIDIA NGC Container
@@ -44,7 +44,7 @@ If you would like to build outside a docker container, please follow the section
 #include "torch_tensorrt/torch_tensorrt.h"
 
 ...
-// Set input datatypes. Allowerd options torch::{kFloat, kHalf, kChar, kInt32, kBool}
+// Set input datatypes. Allowed options torch::{kFloat, kHalf, kChar, kInt32, kBool}
 // Size of input_dtypes should match number of inputs to the network.
 // If input_dtypes is not set, default precision follows traditional PyT / TRT rules
 auto input = torch_tensorrt::Input(dims, torch::kHalf);
@@ -305,7 +305,7 @@ Supported Python versions:
 
 ### In Torch-TensorRT?
 
-Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. Its preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.
+Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. It's preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.
 
 ### In my application?
 

diff --git a/core/conversion/converters/impl/interpolate.cpp b/core/conversion/converters/impl/interpolate.cpp
@@ -523,6 +523,37 @@ auto interpolate_registrations TORCHTRT_UNUSED =
                  resize_layer_size(ctx, n, in, out_shape, {}, nvinfer1::InterpolationMode::kLINEAR, align_corners);
                }
 
+               return true;
+             }})
+        .pattern(
+            {"aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor",
+             [](ConversionCtx* ctx, const torch::jit::Node* n, args& args) -> bool {
+               auto in = args[0].ITensorOrFreeze(ctx);
+               auto grid = args[1].ITensorOrFreeze(ctx);
+               auto interpolation_mode = args[2].unwrapToInt();
+               auto padding_mode = args[3].unwrapToInt();
+               auto align_corners = args[4].unwrapToBool();
+
+               static const auto sample_map = std::map<int, nvinfer1::SampleMode>{
+                   {0, nvinfer1::SampleMode::kFILL},
+                   {1, nvinfer1::SampleMode::kCLAMP},
+                   {2, nvinfer1::SampleMode::kREFLECT}};
+
+               static const auto interpolation_map = std::map<int, nvinfer1::InterpolationMode>{
+                   {0, nvinfer1::InterpolationMode::kLINEAR},
+                   {1, nvinfer1::InterpolationMode::kNEAREST},
+                   {2, nvinfer1::InterpolationMode::kCUBIC}};
+
+               auto grid_sample_layer = ctx->net->addGridSample(*in, *grid);
+               TORCHTRT_CHECK(
+                   grid_sample_layer, "Unable to create grid_sample layer from node: " << util::node_info(n));
+
+               grid_sample_layer->setAlignCorners(align_corners);
+               grid_sample_layer->setSampleMode(sample_map.at(padding_mode));
+               grid_sample_layer->setInterpolationMode(interpolation_map.at(interpolation_mode));
+
+               auto out_tensor = ctx->AssociateValueAndTensor(n->outputs()[0], grid_sample_layer->getOutput(0));
+               LOG_DEBUG("Output tensor shape: " << out_tensor->getDimensions());
                return true;
              }});
 

diff --git a/core/lowering/passes/CMakeLists.txt b/core/lowering/passes/CMakeLists.txt
@@ -26,6 +26,7 @@ target_sources(${lib_name}
             "${CMAKE_CURRENT_SOURCE_DIR}/unpack_rsqrt.cpp"
             "${CMAKE_CURRENT_SOURCE_DIR}/unpack_std.cpp"
             "${CMAKE_CURRENT_SOURCE_DIR}/unpack_var.cpp"
+	    "${CMAKE_CURRENT_SOURCE_DIR}/unpack_scaled_dot_product_attention.cpp"
             "${CMAKE_CURRENT_SOURCE_DIR}/view_to_reshape.cpp"
             "${CMAKE_CURRENT_SOURCE_DIR}/rewrite_inputs_with_params.cpp"
 )

diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -47,7 +47,7 @@ RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/
 RUN add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
 RUN apt-get update
 
-RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.*  libnvparsers-dev=${TENSORRT_VERSION}.*
+RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.*  libnvparsers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-plugin-dev=${TENSORRT_VERSION}.*
 
 # Setup Bazel via Bazelisk
 RUN wget -q https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-amd64 -O /usr/bin/bazel &&\

diff --git a/docsrc/py_api/dynamo.rst b/docsrc/py_api/dynamo.rst
@@ -22,6 +22,8 @@ Functions
 
 .. autofunction:: export
 
+.. autofunction:: convert_module_to_trt_engine
+
 
 
 Classes

diff --git a/py/torch_tensorrt/_compile.py b/py/torch_tensorrt/_compile.py
@@ -1,5 +1,6 @@
 from __future__ import annotations
 
+import collections.abc
 import logging
 from enum import Enum
 from typing import Any, Callable, List, Optional, Sequence, Set
@@ -237,8 +238,6 @@ def compile(
         return compiled_fx_module
     elif target_ir == _IRType.dynamo:
         # Prepare torch and torchtrt inputs
-        import collections.abc
-
         from torch_tensorrt.dynamo.utils import prepare_inputs
 
         if not isinstance(input_list, collections.abc.Sequence):
@@ -342,10 +341,19 @@ def convert_method_to_trt_engine(
             "convert_method_to_trt_engine call is not supported for ir=fx"
         )
     elif target_ir == _IRType.dynamo:
+        # Prepare torch and torchtrt inputs
+        from torch_tensorrt.dynamo.utils import prepare_inputs
+
+        if not isinstance(inputs, collections.abc.Sequence):
+            inputs = [inputs]
+
+        # Export the module
+        torchtrt_inputs = prepare_inputs(inputs)
+        exp_program = torch_tensorrt.dynamo.trace(module, torchtrt_inputs, **kwargs)
+
         return dynamo_convert_module_to_trt_engine(  # type: ignore[no-any-return]
-            module,
+            exp_program,
             inputs=inputs,
-            method_name=method_name,
             enabled_precisions=enabled_precisions_set,
             **kwargs,
         )

diff --git a/py/torch_tensorrt/dynamo/_compiler.py b/py/torch_tensorrt/dynamo/_compiler.py
@@ -422,8 +422,7 @@ def contains_metadata(gm: torch.fx.GraphModule) -> bool:
 
 
 def convert_module_to_trt_engine(
-    module: torch.fx.GraphModule,
-    method_name: str = "forward",
+    exported_program: ExportedProgram,
     inputs: Optional[Sequence[Input | torch.Tensor]] = None,
     enabled_precisions: (
         Set[torch.dtype | dtype] | Tuple[torch.dtype | dtype]
@@ -453,15 +452,15 @@ def convert_module_to_trt_engine(
     calibrator: object = None,
     allow_shape_tensors: bool = False,
 ) -> bytes:
-    """Convert a GraphModule module method to a serialized TensorRT engine
+    """Convert an ExportedProgram to a serialized TensorRT engine
 
-    Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings
+    Converts an ExportedProgram to a serialized TensorRT engine given a dictionary of conversion settings
 
     Arguments:
-        module (torch.fx.GraphModule): Source module
+        exported_program (torch.export.ExportedProgram): Source module
 
     Keyword Args:
-        inputs (List[Union(torch_tensorrt.Input, torch.Tensor)]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
+        inputs (Optional[Sequence[torch_tensorrt.Input | torch.Tensor]]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
             torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum
             to select device type. ::
 
@@ -476,30 +475,11 @@ def convert_module_to_trt_engine(
                     ), # Dynamic input shape for input #2
                     torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
                 ]
-
-        method_name (str): Name of method to convert
-        input_signature Union(List, Tuple, torch_tensorrt.Input, torch.Tensor): A formatted collection of input specifications for the module. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
-            torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type. **This API should be considered beta-level stable and may change in the future** ::
-
-                input_signature=([
-                    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
-                    torch_tensorrt.Input(
-                        min_shape=(1, 224, 224, 3),
-                        opt_shape=(1, 512, 512, 3),
-                        max_shape=(1, 1024, 1024, 3),
-                        dtype=torch.int32
-                        format=torch.channel_last
-                    ), # Dynamic input shape for input #2
-                ], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3
-
-        device (Union(torch_tensorrt.Device, torch.device, dict)): Target device for TensorRT engines to run on ::
-
-            device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
-
+        enabled_precisions (Optional[Set[torch.dtype | _enums.dtype]]): The set of datatypes that TensorRT can use
         debug (bool): Whether to print out verbose debugging information
         workspace_size (int): Workspace TRT is allowed to use for the module (0 is default)
         min_block_size (int): Minimum number of operators per TRT-Engine Block
-        torch_executed_ops (Sequence[str]): Sequence of operations to run in Torch, regardless of converter coverage
+        torch_executed_ops (Set[str]): Set of operations to run in Torch, regardless of converter coverage
         pass_through_build_failures (bool): Whether to fail on TRT engine build errors (True) or not (False)
         max_aux_streams (Optional[int]): Maximum number of allowed auxiliary TRT streams for each engine
         version_compatible (bool): Provide version forward-compatibility for engine plan files
@@ -566,13 +546,25 @@ def convert_module_to_trt_engine(
         "dla_global_dram_size": dla_global_dram_size,
     }
 
+    # Decompose the exported program
+    exported_program = exported_program.run_decompositions(
+        get_decompositions(enable_experimental_decompositions)
+    )
+    gm = exported_program.module()
+    logger.debug("Input graph: " + str(gm.graph))
+
+    # Apply lowering on the graph module
+    torch_inputs = get_torch_inputs(input_list, device)
+    gm = apply_lowering_passes(gm, torch_inputs)
+    logger.debug("Lowered Input graph: " + str(gm.graph))
+
     settings = CompilationSettings(**compilation_options)
     logger.info("Compilation Settings: %s\n", settings)
     try:
-        interpreter_result = interpret_module_to_result(module, input_list, settings)
+        interpreter_result = interpret_module_to_result(gm, input_list, settings)
     except UnsupportedOperatorException:
         logger.error(
-            f"Conversion of module {module} not currently fully supported or convertible!",
+            f"Conversion of module {gm} not currently fully supported or convertible!",
             exc_info=True,
         )
     except Exception as e:
Original file line number	Diff line number	Diff line change
Expand Up		@@ -22,6 +22,8 @@ Functions

		.. autofunction:: export

		.. autofunction:: convert_module_to_trt_engine



		Classes
Expand Down