Skip to content

feat: support tiling optimization as of TRT 10.8 #3444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 20, 2025
Merged

Conversation

zewenli98
Copy link
Collaborator

@zewenli98 zewenli98 commented Mar 17, 2025

Description

Support tiling optimization as of TRT 10.8. More details see TRT doc: https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/inference-library/advanced.html#tiling-optimization

Fixes #3443

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@zewenli98 zewenli98 self-assigned this Mar 17, 2025
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Mar 17, 2025
@@ -329,6 +329,41 @@ def _populate_trt_builder_config(
if self.compilation_settings.enable_weight_streaming:
builder_config.set_flag(trt.BuilderFlag.WEIGHT_STREAMING)

if version.parse(trt.__version__) >= version.parse("10.8"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just drop 10.7 instead having this piecemeal support

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do it for some settings but not others, so we need to decide if we want versioned builder config or not

Copy link
Collaborator

@narendasan narendasan Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my default stance is no but if its not too much work (outside of 2.7 scope) then we might want to in which case this can stay

@zewenli98 zewenli98 force-pushed the support_tiling_opt branch from 3f809a0 to f2203fa Compare March 17, 2025 23:13
@@ -169,6 +171,8 @@ def cross_compile_for_windows(
strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required.
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored.
enable_weight_streaming (bool): Enable weight streaming.
tiling_optimization_level (int): The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy. (We currently support [0, 1, 2, 3], default is 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah using the names is a good idea

Comment on lines +354 to +356
builder_config.l2_limit_for_tiling = (
self.compilation_settings.l2_limit_for_tiling
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to be really safe (when we remove version guarding), you can check if self.compilation_settings.get("l2_limit_for_tiling", -1) != -1 or something.

@zewenli98
Copy link
Collaborator Author

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

@narendasan
Copy link
Collaborator

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

@zewenli98
Copy link
Collaborator Author

@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg optimization_level. It's kind of weird that a level accepts integers but the other accept strings. 2) I think people are good at memorizing integers instead of strings. Some people may use moderate but others may use intermediate.

Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum

Got it. trtexec actually uses integers as well. Anyways I'll change to strings

Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zewenli98 zewenli98 merged commit 656bb9e into main Mar 20, 2025
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths
Projects
None yet
Development

Successfully merging this pull request may close these issues.

✨[Feature] Expose tiling_optimization_level and l2_limit_for_tiling to compilation settings
6 participants