Skip to content

[FX] Update getting_started_with_fx_path.rst #1342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 37 additions & 2 deletions docsrc/tutorials/getting_started_with_fx_path.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,42 @@ user want to use this tool and we will introduce them here.
Converting a PyTorch Model to TensorRT Engine
---------------------------------------------
In general, users are welcome to use the ``compile()`` to finish the conversion from a model to tensorRT engine. It is a
wrapper API that consists of the major steps needed to finish this converison. Please refer to ``lower_example.py`` file in ``examples/fx``.
wrapper API that consists of the major steps needed to finish this converison. Please refer to an example usage in ``lower_example.py`` file under ``examples/fx``.

.. code-block:: shell

def compile(
module: nn.Module,
input,
max_batch_size: int = 2048,
max_workspace_size=1 << 25,
explicit_batch_dimension=False,
lower_precision=LowerPrecision.FP16,
verbose_log=False,
timing_cache_prefix="",
save_timing_cache=False,
cuda_graph_batch_size=-1,
dynamic_batch=True,
) -> nn.Module:
"""
Takes in original module, input and lowering setting, run lowering workflow to turn module
into lowered module, or so called TRTModule.

Args:
module: Original module for lowering.
input: Input for module.
max_batch_size: Maximum batch size (must be >= 1 to be set, 0 means not set)
max_workspace_size: Maximum size of workspace given to TensorRT.
explicit_batch_dimension: Use explicit batch dimension in TensorRT if set True, otherwise use implicit batch dimension.
lower_precision: lower_precision config given to TRTModule.
verbose_log: Enable verbose log for TensorRT if set True.
timing_cache_prefix: Timing cache file name for timing cache used by fx2trt.
save_timing_cache: Update timing cache with current timing cache data if set to True.
cuda_graph_batch_size: Cuda graph batch size, default to be -1.
dynamic_batch: batch dimension (dim=0) is dynamic.
Returns:
A torch.nn.Module lowered by TensorRT.
"""

In this section, we will go through an example to illustrate the major steps that fx path uses.
Users can refer to ``fx2trt_example.py`` file in ``examples/fx``.
Expand Down Expand Up @@ -56,7 +91,7 @@ Explicit batch is the default mode and it must be set for dynamic shape. For mos

For examples of the last path, if we have a 3D tensor t shaped as (batch, sequence, dimension), operations such as torch.transpose(0, 2). If any of these three are not satisfied, we’ll need to specify InputTensorSpec as inputs with dynamic range.

.. code-block:: shell
c

import deeplearning.trt.fx2trt.converter.converters
from torch.fx.experimental.fx2trt.fx2trt import InputTensorSpec, TRTInterpreter
Expand Down