Skip to content

Commit b3367d5

Browse files
committed
add doc
1 parent cbc282a commit b3367d5

File tree

4 files changed

+46
-5
lines changed

4 files changed

+46
-5
lines changed

docs/_sources/py_api/runtime.rst.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Functions
1919

2020
.. autofunction:: get_whole_cudagraphs_mode
2121

22-
.. autofunction:: set_cudagraphs_modue
22+
.. autofunction:: set_cudagraphs_mode
2323

2424
.. autofunction:: enable_pre_allocated_outputs
2525

docsrc/py_api/runtime.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,14 @@ Functions
1919

2020
.. autofunction:: get_whole_cudagraphs_mode
2121

22-
.. autofunction:: set_cudagraphs_modue
22+
.. autofunction:: set_cudagraphs_mode
2323

2424
.. autofunction:: enable_pre_allocated_outputs
2525

2626
.. autofunction:: weight_streaming
2727

28+
.. autofunction:: enable_output_allocator
29+
2830
Classes
2931
---------
3032

docsrc/user_guide/runtime.rst

+38
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,41 @@ Cudagraphs can accelerate certain models by reducing kernel overheads, as docume
9292
In the current implementation, use of a new input shape (for instance in dynamic shape
9393
cases), will cause the cudagraph to be re-recorded. Cudagraph recording is generally
9494
not latency intensive, and future improvements include caching cudagraphs for multiple input shapes.
95+
96+
Dynamic Output Allocation Mode
97+
------------------------------
98+
99+
Dynamic output allocation is a feature in Torch-TensorRT which allows the output buffer of TensorRT engines to be
100+
dynamically allocated. This is useful for models with dynamic output shapes, especially ops with data-dependent shapes.
101+
Without dynamic output allocation, the output buffer is statically allocated and the size is the maximum possible size
102+
required by the op. This can lead to inefficient memory usage if the actual output size is smaller than the maximum possible size.
103+
104+
There are two scenarios in which dynamic output allocation is enabled:
105+
106+
1. When the model contains submodules that require a dynamic output allocator at runtime, users don't have to manually enable dynamic output allocation mode.
107+
108+
To specify if a module requires a dynamic output allocator, users can set the ``requires_output_allocator=True`` flag in the ``@dynamo_tensorrt_converter`` decorator of converters. e.g.,
109+
110+
.. code-block:: python
111+
112+
@dynamo_tensorrt_converter(
113+
torch.ops.aten.nonzero.default,
114+
supports_dynamic_shapes=True,
115+
requires_output_allocator=True,
116+
)
117+
def aten_ops_nonzero(
118+
ctx: ConversionContext,
119+
target: Target,
120+
args: Tuple[Argument, ...],
121+
kwargs: Dict[str, Argument],
122+
name: str,
123+
) -> Union[TRTTensor, Sequence[TRTTensor]]:
124+
...
125+
126+
2. When users manually enable dynamic output allocation via the ``torch_tensorrt.runtime.enable_output_allocator`` context manager.
127+
128+
.. code-block:: python
129+
130+
# Enables Dynamic Output Allocation Mode, then resets the mode to its prior setting
131+
with torch_tensorrt.runtime.enable_output_allocator(trt_module):
132+
...

examples/dynamo/converter_overloading.py

+4-3
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,11 @@ def forward(self, x):
5858

5959
from typing import Dict, Sequence, Tuple, Union
6060

61+
import tensorrt as trt
6162
from torch.fx.node import Argument, Node, Target
6263
from torch_tensorrt.dynamo import CompilationSettings
6364
from torch_tensorrt.dynamo.conversion import ConversionContext
6465

65-
import tensorrt as trt
66-
6766
# %%
6867
# Converter Metadata
6968
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -80,6 +79,8 @@ def forward(self, x):
8079
supports_dynamic_shapes=True,
8180
# Set the priority of the converter to supersede the default one
8281
priority=torch_tensorrt.dynamo.conversion.ConverterPriority.HIGH,
82+
# Whether the converter requires a dynamic output allocator to run (e.g. data dependent ops)
83+
requires_output_allocator=True,
8384
)
8485

8586
# %%
@@ -98,7 +99,7 @@ def forward(self, x):
9899
#
99100
# Finally there is the ``priority`` argument, which is an enum from the ``torch_tensorrt.dynamo.conversion.ConverterPriority`` class that defines the priority of the converter. The two options are ``HIGH`` and ``STANDARD``.
100101
# Converters registered with ``STANDARD`` will be appended to the converter list for a given operation, while converters registered with ``HIGH`` will be prepended to the list.
101-
# Candidate converters are evalated for their suitablity in this priority order and the first converter that passes the validator is used.
102+
# Candidate converters are evalated for their suitability in this priority order and the first converter that passes the validator is used.
102103

103104

104105
# %%

0 commit comments

Comments
 (0)