Skip to content

Commit 2b4b9e3

Browse files
committed
feat(trtorchc): Embedding engines in modules from the CLI
Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
1 parent 5befd29 commit 2b4b9e3

File tree

3 files changed

+106
-71
lines changed

3 files changed

+106
-71
lines changed

Diff for: cpp/trtorchc/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,10 @@ trtorchc [input_file_path] [output_file_path]
5757
--calibration-cache-file=[file_path]
5858
Path to calibration cache file to use
5959
for post training quantization
60+
--embed-engine Whether to treat input file as a
61+
serialized TensorRT engine and embed it
62+
into a TorchScript module (device spec
63+
must be provided)
6064
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
6165
used to select kernels
6266
--num-avg-timing-iters=[num_iters]

Diff for: cpp/trtorchc/main.cpp

+27
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,18 @@ std::vector<std::vector<int64_t>> parseDynamicDim(std::string shape_str) {
135135
return shape;
136136
}
137137

138+
std::string read_buf(std::string const& path) {
139+
std::string buf;
140+
std::ifstream stream(path.c_str(), std::ios::binary);
141+
142+
if (stream) {
143+
stream >> std::noskipws;
144+
std::copy(std::istream_iterator<char>(stream), std::istream_iterator<char>(), std::back_inserter(buf));
145+
}
146+
147+
return buf;
148+
}
149+
138150
std::string get_cwd() {
139151
char buff[FILENAME_MAX]; // create string buffer to hold path
140152
if (getcwd(buff, FILENAME_MAX)) {
@@ -224,6 +236,13 @@ int main(int argc, char** argv) {
224236
"file_path",
225237
"Path to calibration cache file to use for post training quantization",
226238
{"calibration-cache-file"});
239+
240+
args::Flag embed_engine(
241+
parser,
242+
"embed-engine",
243+
"Whether to treat input file as a serialized TensorRT engine and embed it into a TorchScript module (device spec must be provided)",
244+
{"embed-engine"});
245+
227246
args::ValueFlag<int> num_min_timing_iters(
228247
parser, "num_iters", "Number of minimization timing iterations used to select kernels", {"num-min-timing-iter"});
229248
args::ValueFlag<int> num_avg_timing_iters(
@@ -484,6 +503,14 @@ int main(int argc, char** argv) {
484503
auto real_input_path = resolve_path(args::get(input_path));
485504
auto real_output_path = resolve_path(args::get(output_path));
486505

506+
// Instead of compiling, just embed engine in a PyTorch module
507+
if (embed_engine) {
508+
std::string serialized_engine = read_buf(real_input_path);
509+
auto trt_mod = trtorch::EmbedEngineInNewModule(serialized_engine, compile_settings.device);
510+
trt_mod.save(real_output_path);
511+
return 0;
512+
}
513+
487514
torch::jit::Module mod;
488515
try {
489516
// Deserialize the ScriptModule from a file using torch::jit::load().

Diff for: docsrc/tutorials/trtorchc.rst

+75-71
Original file line numberDiff line numberDiff line change
@@ -19,79 +19,83 @@ to standard TorchScript. Load with ``torch.jit.load()`` and run like you would r
1919
trtorchc [input_file_path] [output_file_path]
2020
[input_specs...] {OPTIONS}
2121
22-
TRTorch is a compiler for TorchScript, it will compile and optimize
23-
TorchScript programs to run on NVIDIA GPUs using TensorRT
22+
TRTorch is a compiler for TorchScript, it will compile and optimize
23+
TorchScript programs to run on NVIDIA GPUs using TensorRT
2424
25-
OPTIONS:
25+
OPTIONS:
2626
27-
-h, --help Display this help menu
28-
Verbiosity of the compiler
29-
-v, --verbose Dumps debugging information about the
30-
compilation process onto the console
31-
-w, --warnings Disables warnings generated during
32-
compilation onto the console (warnings
33-
are on by default)
34-
--i, --info Dumps info messages generated during
35-
compilation onto the console
36-
--build-debuggable-engine Creates a debuggable engine
37-
--use-strict-types Restrict operating type to only use set
38-
operation precision
39-
--allow-gpu-fallback (Only used when targeting DLA
40-
(device-type)) Lets engine run layers on
41-
GPU if they are not supported on DLA
42-
--disable-tf32 Prevent Float32 layers from using the
43-
TF32 data format
44-
-p[precision...],
45-
--enabled-precison=[precision...] (Repeatable) Enabling an operating
46-
precision for kernels to use when
47-
building the engine (Int8 requires a
48-
calibration-cache argument) [ float |
49-
float32 | f32 | half | float16 | f16 |
50-
int8 | i8 ] (default: float)
51-
-d[type], --device-type=[type] The type of device the engine should be
52-
built for [ gpu | dla ] (default: gpu)
53-
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
54-
(defaults to 0)
55-
--dla-core=[dla_core] DLACore id if running on available DLA
56-
(defaults to 0)
57-
--engine-capability=[capability] The type of device the engine should be
58-
built for [ default | safe_gpu |
59-
safe_dla ]
60-
--calibration-cache-file=[file_path]
61-
Path to calibration cache file to use
62-
for post training quantization
63-
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
64-
used to select kernels
65-
--num-avg-timing-iters=[num_iters]
66-
Number of averaging timing iterations
67-
used to select kernels
68-
--workspace-size=[workspace_size] Maximum size of workspace given to
69-
TensorRT
70-
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
71-
set, 0 means not set)
72-
-t[threshold],
73-
--threshold=[threshold] Maximum acceptable numerical deviation
74-
from standard torchscript output
75-
(default 2e-5)
76-
--save-engine Instead of compiling a full a
77-
TorchScript program, save the created
78-
engine to the path specified as the
79-
output path
80-
input_file_path Path to input TorchScript file
81-
output_file_path Path for compiled TorchScript (or
82-
TensorRT engine) file
83-
input_specs... Specs for inputs to engine, can either
84-
be a single size or a range defined by
85-
Min, Optimal, Max sizes, e.g.
86-
"(N,..,C,H,W)"
87-
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
88-
Data Type and format can be specified by
89-
adding an "@" followed by dtype and "%"
90-
followed by format to the end of the
91-
shape spec. e.g. "(3, 3, 32,
92-
32)@f16%NHWC"
93-
"--" can be used to terminate flag options and force all following
94-
arguments to be treated as positional options
27+
-h, --help Display this help menu
28+
Verbiosity of the compiler
29+
-v, --verbose Dumps debugging information about the
30+
compilation process onto the console
31+
-w, --warnings Disables warnings generated during
32+
compilation onto the console (warnings
33+
are on by default)
34+
--i, --info Dumps info messages generated during
35+
compilation onto the console
36+
--build-debuggable-engine Creates a debuggable engine
37+
--use-strict-types Restrict operating type to only use set
38+
operation precision
39+
--allow-gpu-fallback (Only used when targeting DLA
40+
(device-type)) Lets engine run layers on
41+
GPU if they are not supported on DLA
42+
--disable-tf32 Prevent Float32 layers from using the
43+
TF32 data format
44+
-p[precision...],
45+
--enabled-precison=[precision...] (Repeatable) Enabling an operating
46+
precision for kernels to use when
47+
building the engine (Int8 requires a
48+
calibration-cache argument) [ float |
49+
float32 | f32 | half | float16 | f16 |
50+
int8 | i8 ] (default: float)
51+
-d[type], --device-type=[type] The type of device the engine should be
52+
built for [ gpu | dla ] (default: gpu)
53+
--gpu-id=[gpu_id] GPU id if running on multi-GPU platform
54+
(defaults to 0)
55+
--dla-core=[dla_core] DLACore id if running on available DLA
56+
(defaults to 0)
57+
--engine-capability=[capability] The type of device the engine should be
58+
built for [ default | safe_gpu |
59+
safe_dla ]
60+
--calibration-cache-file=[file_path]
61+
Path to calibration cache file to use
62+
for post training quantization
63+
--embed-engine Whether to treat input file as a
64+
serialized TensorRT engine and embed it
65+
into a TorchScript module (device spec
66+
must be provided)
67+
--num-min-timing-iter=[num_iters] Number of minimization timing iterations
68+
used to select kernels
69+
--num-avg-timing-iters=[num_iters]
70+
Number of averaging timing iterations
71+
used to select kernels
72+
--workspace-size=[workspace_size] Maximum size of workspace given to
73+
TensorRT
74+
--max-batch-size=[max_batch_size] Maximum batch size (must be >= 1 to be
75+
set, 0 means not set)
76+
-t[threshold],
77+
--threshold=[threshold] Maximum acceptable numerical deviation
78+
from standard torchscript output
79+
(default 2e-5)
80+
--save-engine Instead of compiling a full a
81+
TorchScript program, save the created
82+
engine to the path specified as the
83+
output path
84+
input_file_path Path to input TorchScript file
85+
output_file_path Path for compiled TorchScript (or
86+
TensorRT engine) file
87+
input_specs... Specs for inputs to engine, can either
88+
be a single size or a range defined by
89+
Min, Optimal, Max sizes, e.g.
90+
"(N,..,C,H,W)"
91+
"[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]".
92+
Data Type and format can be specified by
93+
adding an "@" followed by dtype and "%"
94+
followed by format to the end of the
95+
shape spec. e.g. "(3, 3, 32,
96+
32)@f16%NHWC"
97+
"--" can be used to terminate flag options and force all following
98+
arguments to be treated as positional options
9599
96100
97101
e.g.

0 commit comments

Comments
 (0)