-
Notifications
You must be signed in to change notification settings - Fork 365
feat: support tiling optimization as of TRT 10.8 #3444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -329,6 +329,41 @@ def _populate_trt_builder_config( | |||
if self.compilation_settings.enable_weight_streaming: | |||
builder_config.set_flag(trt.BuilderFlag.WEIGHT_STREAMING) | |||
|
|||
if version.parse(trt.__version__) >= version.parse("10.8"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just drop 10.7 instead having this piecemeal support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we do it for some settings but not others, so we need to decide if we want versioned builder config or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my default stance is no but if its not too much work (outside of 2.7 scope) then we might want to in which case this can stay
py/torch_tensorrt/dynamo/runtime/_MutableTorchTensorRTModule.py
Outdated
Show resolved
Hide resolved
3f809a0
to
f2203fa
Compare
@@ -169,6 +171,8 @@ def cross_compile_for_windows( | |||
strip_engine_weights (bool): Strip engine weights from the serialized engine. This is useful when the engine is to be deployed in an environment where the weights are not required. | |||
immutable_weights (bool): Build non-refittable engines. This is useful for some layers that are not refittable. If this argument is set to true, `strip_engine_weights` and `refit_identical_engine_weights` will be ignored. | |||
enable_weight_streaming (bool): Enable weight streaming. | |||
tiling_optimization_level (int): The optimization level of tiling strategies. A Higher level allows TensorRT to spend more time searching for better optimization strategy. (We currently support [0, 1, 2, 3], default is 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your opinion on exposing this as string ? https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/BuilderConfig.html#tensorrt.TilingOptimizationLevel instead of integers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah using the names is a good idea
builder_config.l2_limit_for_tiling = ( | ||
self.compilation_settings.l2_limit_for_tiling | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to be really safe (when we remove version guarding), you can check if self.compilation_settings.get("l2_limit_for_tiling", -1) != -1 or something.
@peri044 @HolyWu Thanks for your suggestions. I personally prefer using integers instead of strings because 1) it keeps consistent with another arg |
Dont think it matters that it should be consistent with the other optimization api, TensorRT made them different for some reason but I don't think we need to fix that for them. It should be consistent with TensorRT's API. I think the appropriate choices are strings or an enum |
Got it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Support tiling optimization as of TRT 10.8. More details see TRT doc: https://docs.nvidia.com/deeplearning/tensorrt/10.9.0/inference-library/advanced.html#tiling-optimization
Fixes #3443
Type of change
Checklist: