Skip to content

Commit 09f5b13

Browse files
committed
docs: Updated execution phase docs
Signed-off-by: Naren Dasan <[email protected]> Signed-off-by: Naren Dasan <[email protected]>
1 parent 6d60246 commit 09f5b13

File tree

8 files changed

+165
-70
lines changed

8 files changed

+165
-70
lines changed

docs/_sources/contributors/execution.rst.txt

+43-18
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,13 @@
33
Execution Phase
44
================
55

6-
The execution phase is responsible for managing TensorRT engines, constructing a new module for the TensorRT engines,
7-
and acting as a runtime for JIT modules calling TensorRT engines. The main interface accepts a serialized
8-
TensorRT engine. It stands up the engine within the Engine Manager which maintains a execution context for each engine
9-
and some metadata about its inputs and outputs. Each engine is assigned an ID which can be used to reference the engine
10-
when running the a module with the JIT interpreter.
6+
The execution phase is responsible for constructing self standing TorchScript graphs with embedded TensorRT engines and serving as the runtime
7+
when these engines are called. The main interface accepts a serialized TensorRT engine. The execution phase
8+
will deserialize and wrap this engine in a class which maintains a execution context for each engine
9+
and some metadata about its inputs and outputs and is compatable with the TorchScript interpreter so that
10+
it can be moved around and used like other TorchScript IValues. The engine is run by providing it and inputs
11+
to the ``trt::execute_engine`` operator which will take the engine and its inputs and return the results of engine exeuction.
12+
1113

1214
Background
1315
------------
@@ -19,27 +21,50 @@ torch::jit::Value type).
1921
TensorRT Engine Executor Op
2022
----------------------------
2123

22-
When the TRTorch is loaded, it registers an operator in the PyTorch JIT operator library called ``trt::execute_engine(int id, ...) -> ...``
23-
which takes a engine ID and inputs. It will then use the ID to look up the coresponding execution context, then
24-
pop off the inputs from the runtime stack. These inputs are passed into a generic engine execution function which
24+
When the TRTorch is loaded, it registers an operator in the PyTorch JIT operator library called
25+
``trt::execute_engine(Tensor[] inputs, __torch__.torch.classes.tensorrt.Engine engine) -> Tensor[]`` which takes an
26+
instantiated engine and list of inputs. Compiled graphs store this engine in an attribute so that it is portable and serializable.
27+
When the op is called, an instnantiated engine and input tensors are popped off the runtime stack. These inputs are passed into a generic engine execution function which
2528
will run the tensors through the TensorRT engine and return new tensors as results. These tensors are pushed on to the
2629
stack so that the next op whatever it is can use it.
2730

2831
Constructing the Resulting Graph
2932
-----------------------------------
3033

31-
Once the engine is registered, the compiler will construct a graph that will execute the engine when the module is called.
34+
Once the engine is deserialized and instantiated, the compiler will construct a graph that will execute the engine when the module is called.
3235
Here is an example:
3336

3437
.. code-block::
3538
36-
graph(%self.1 : __torch__.___torch_mangle_10.LeNet_trt,
37-
%2 : Tensor):
38-
%1 : int = prim::Constant[value=94106001690080]()
39-
%3 : Tensor = trt::execute_engine(%1, %2)
40-
return (%3)
41-
(AddEngineToGraph)
39+
graph(%self_1 : __torch__.torchvision.models.resnet.___torch_mangle_4847.ResNet_trt,
40+
%input_0 : Tensor):
41+
%1 : __torch__.torch.classes.tensorrt.Engine = prim::GetAttr[name="__torch___torchvision_models_resnet____torch_mangle_4847_ResNet_trt_engine"](%self_1)
42+
%3 : Tensor[] = prim::ListConstruct(%input_0)
43+
%4 : Tensor[] = trt::execute_engine(%3, %1)
44+
%5 : Tensor = prim::ListUnpack(%4)
45+
return (%5)
46+
47+
You can see the engine attribute in the graph and the ``trt::execute_engine`` op taking a list of input tensors and an engine in
48+
and produces a list of output tensors which is returned. When ``forward`` is called on the module this graph is executed, thereby
49+
running the TensorRT engine.
50+
51+
In the case of multiple outputs, the compiled graph may repack output tensors into a Tuple to return back to the user.
52+
53+
.. code-block::
54+
55+
graph(%self_1 : __torch__.PyTorch.Detection.SSD.src.model.SSD300_trt,
56+
%input_0 : Tensor):
57+
%1 : __torch__.torch.classes.tensorrt.Engine = prim::GetAttr[name="__torch___PyTorch_Detection_SSD_src_model_SSD300_trt_engine"](%self_1)
58+
%3 : Tensor[] = prim::ListConstruct(%input_0)
59+
%4 : Tensor[] = trt::execute_engine(%3, %1)
60+
%5 : Tensor, %6 : Tensor = prim::ListUnpack(%4)
61+
%7 : (Tensor, Tensor) = prim::TupleConstruct(%5, %6)
62+
return (%7)
63+
64+
Serialization and Deserialization
65+
----------------------------------
4266

43-
You can see the ID as a constant in the graph and the ``trt::execute_engine`` op taking the constant and an input tensor in
44-
and produces an output tensor which is returned. When ``forward`` is called on the module this graph is executed, thereby
45-
running the TensorRT engine.
67+
Serialization and deserialization of TensorRT engines embedded in TorchScript graphs are handled by the holder class for the engine and TorchBind.
68+
When a TorchScript module is saved, the pickler will run serilization on the cuda engine and store the serialized engine in the zip file created.
69+
When deserializing, the depickler will call a constructor for the engine holder class with the serialized engine so that it can be set up again for
70+
execution.

docs/_sources/contributors/lowering.rst.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -188,4 +188,4 @@ Unroll Loops
188188

189189
`torch/csrc/jit/passes/loop_unrolling.h <https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/loop_unrolling.h>`_
190190

191-
Unrolls the operations of compatable loops (e.g. sufficently short) so that you only have to go through the loop once.
191+
Unrolls the operations of compatable loops (e.g. sufficently short) so that you only have to go through the loop once.

0 commit comments

Comments
 (0)