You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
170
172
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
175
+
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
172
176
**kwargs: Any,
173
177
Returns:
174
178
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
"""Compile an ExportedProgram module for NVIDIA GPUs using TensorRT
@@ -488,6 +496,8 @@ def compile(
488
496
strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
489
497
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
500
+
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
491
501
**kwargs: Any,
492
502
Returns:
493
503
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
1014
1028
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
1031
+
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
1016
1032
Returns:
1017
1033
bytes: Serialized TensorRT engine, can either be saved to a file or deserialized via TensorRT APIs
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+8
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,7 @@
20
20
ENGINE_CAPABILITY,
21
21
HARDWARE_COMPATIBLE,
22
22
IMMUTABLE_WEIGHTS,
23
+
L2_LIMIT_FOR_TILING,
23
24
LAZY_ENGINE_INIT,
24
25
MAX_AUX_STREAMS,
25
26
MIN_BLOCK_SIZE,
@@ -31,6 +32,7 @@
31
32
REUSE_CACHED_ENGINES,
32
33
SPARSE_WEIGHTS,
33
34
STRIP_ENGINE_WEIGHTS,
35
+
TILING_OPTIMIZATION_LEVEL,
34
36
TIMING_CACHE_PATH,
35
37
TRUNCATE_DOUBLE,
36
38
USE_AOT_JOINT_EXPORT,
@@ -93,6 +95,8 @@ class CompilationSettings:
93
95
enable_cross_compile_for_windows (bool): By default this is False means TensorRT engines can only be executed on the same platform where they were built.
94
96
True will enable cross-platform compatibility which allows the engine to be built on Linux and run on Windows
95
97
use_aot_joint_export (bool): Use aot_export_joint_simple, else wrap backend with AOT_autograd, required for distributed tensors
98
+
tiling_optimization_level (str): The optimization level of tiling strategies. A higher level allows TensorRT to spend more time searching for better tiling strategy. We currently support ["none", "fast", "moderate", "full"].
99
+
l2_limit_for_tiling (int): The target L2 cache usage limit (in bytes) for tiling optimization (default is -1 which means no limit).
0 commit comments