feat: support tiling optimization as of TRT 10.8 #3444

zewenli98 · 2025-03-17T21:22:39Z

Description

Support tiling optimization as of TRT 10.8. More details see TRT doc: https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/inference-library/advanced.html#tiling-optimization

Fixes #3443

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/_compiler.py

narendasan · 2025-03-17T21:41:47Z

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

@@ -329,6 +329,41 @@ def _populate_trt_builder_config(
        if self.compilation_settings.enable_weight_streaming:
            builder_config.set_flag(trt.BuilderFlag.WEIGHT_STREAMING)

+        if version.parse(trt.__version__) >= version.parse("10.8"):


We can just drop 10.7 instead having this piecemeal support

Looks like we do it for some settings but not others, so we need to decide if we want versioned builder config or not

my default stance is no but if its not too much work (outside of 2.7 scope) then we might want to in which case this can stay

py/torch_tensorrt/dynamo/runtime/_MutableTorchTensorRTModule.py

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

peri044 · 2025-03-17T23:31:21Z

py/torch_tensorrt/dynamo/_compiler.py

@@ -169,6 +171,8 @@ def cross_compile_for_windows(
        strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
        immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
        enable_weight_streaming (bool): Enable weight streaming.
+        tiling_optimization_level (int): The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy. (We currently support [0, 1, 2, 3], default is 0)


What's your opinion on exposing this as string ? https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.TilingOptimizationLevel instead of integers ?

Yeah using the names is a good idea

peri044 · 2025-03-17T23:35:01Z

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

+            builder_config.l2_limit_for_tiling = (
+                self.compilation_settings.l2_limit_for_tiling
+            )


if you want to be really safe (when we remove version guarding), you can check if self.compilation_settings.get("l2_limit_for_tiling", -1) != -1 or something.

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

zewenli98 · 2025-03-18T17:26:10Z

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

narendasan · 2025-03-18T20:42:46Z

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

zewenli98 · 2025-03-18T21:02:56Z

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

Got it. trtexec actually uses integers as well. Anyways I'll change to strings

peri044

LGTM

support tiling optimization

fdc7e3c

zewenli98 requested review from narendasan, peri044, apbose and keehyuna March 17, 2025 21:22

zewenli98 self-assigned this Mar 17, 2025

facebook-github-bot added the cla signed label Mar 17, 2025

github-actions bot added component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Mar 17, 2025