You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docsrc/user_guide/runtime.rst
+38
Original file line number
Diff line number
Diff line change
@@ -92,3 +92,41 @@ Cudagraphs can accelerate certain models by reducing kernel overheads, as docume
92
92
In the current implementation, use of a new input shape (for instance in dynamic shape
93
93
cases), will cause the cudagraph to be re-recorded. Cudagraph recording is generally
94
94
not latency intensive, and future improvements include caching cudagraphs for multiple input shapes.
95
+
96
+
Dynamic Output Allocation Mode
97
+
------------------------------
98
+
99
+
Dynamic output allocation is a feature in Torch-TensorRT which allows the output buffer of TensorRT engines to be
100
+
dynamically allocated. This is useful for models with dynamic output shapes, especially ops with data-dependent shapes.
101
+
Without dynamic output allocation, the output buffer is statically allocated and the size is the maximum possible size
102
+
required by the op. This can lead to inefficient memory usage if the actual output size is smaller than the maximum possible size.
103
+
104
+
There are two scenarios in which dynamic output allocation is enabled:
105
+
106
+
1. When the model contains submodules that require a dynamic output allocator at runtime, users don't have to manually enable dynamic output allocation mode.
107
+
108
+
To specify if a module requires a dynamic output allocator, users can set the ``requires_output_allocator=True`` flag in the ``@dynamo_tensorrt_converter`` decorator of converters. e.g.,
109
+
110
+
.. code-block:: python
111
+
112
+
@dynamo_tensorrt_converter(
113
+
torch.ops.aten.nonzero.default,
114
+
supports_dynamic_shapes=True,
115
+
requires_output_allocator=True,
116
+
)
117
+
defaten_ops_nonzero(
118
+
ctx: ConversionContext,
119
+
target: Target,
120
+
args: Tuple[Argument, ...],
121
+
kwargs: Dict[str, Argument],
122
+
name: str,
123
+
) -> Union[TRTTensor, Sequence[TRTTensor]]:
124
+
...
125
+
126
+
2. When users manually enable dynamic output allocation via the ``torch_tensorrt.runtime.enable_output_allocator`` context manager.
127
+
128
+
.. code-block:: python
129
+
130
+
# Enables Dynamic Output Allocation Mode, then resets the mode to its prior setting
131
+
with torch_tensorrt.runtime.enable_output_allocator(trt_module):
# Whether the converter requires a dynamic output allocator to run (e.g. data dependent ops)
83
+
requires_output_allocator=True,
83
84
)
84
85
85
86
# %%
@@ -98,7 +99,7 @@ def forward(self, x):
98
99
#
99
100
# Finally there is the ``priority`` argument, which is an enum from the ``torch_tensorrt.dynamo.conversion.ConverterPriority`` class that defines the priority of the converter. The two options are ``HIGH`` and ``STANDARD``.
100
101
# Converters registered with ``STANDARD`` will be appended to the converter list for a given operation, while converters registered with ``HIGH`` will be prepended to the list.
101
-
# Candidate converters are evalated for their suitablity in this priority order and the first converter that passes the validator is used.
102
+
# Candidate converters are evalated for their suitability in this priority order and the first converter that passes the validator is used.
0 commit comments