✨[Feature] Perform constant folding of operations in the graph if nodes can be evaluated at compile time #1266

peri044 · 2022-08-15T14:37:48Z

Is your feature request related to a problem? Please describe.

%28 : int = aten::size(%22, %27)
...
%31 : Tensor[] = tensorrt::execute_engine(%30, %trt_engine_0x55cc735470f0)
%32 : Tensor, %33 : Tensor = prim::ListUnpack(%31)

%34 : int = prim::Constant[value=15]()
%35 : int = prim::Constant[value=7]()
%36 : int = prim::Constant[value=0]()
%37 : int[] = prim::ListConstruct(%28, %34, %34, %35)

...
%41 : Tensor[] = tensorrt::execute_engine(%40, %trt_engine_0x55cc73547290)
%bar : Tensor = aten::reshape(%foo, %37)

In the snippet below we have a reshape node %bar which is forced to fallback by min_block_size. Because %bar is forced to fallback its input %37 = prim::ListConstruct is also forced to fallback along with %28 : int = aten::size. In both cases these ops are supported by evaluators and (at least with static shapes) should be resolved to constants during conversion. Currently these ops are creating unnecessary breaks between TRT regions that will impact performance.

Describe the solution you'd like
Could evaluatable nodes be resolved to constants in the torchscript graph before partitioning to avoid this impact on the partition?

Describe alternatives you've considered
Could fallback nodes with no dependencies on active TRT nodes be consolidated to avoid breaking TRT regions? (In this case %37 is not used anywhere other than in the reshape node %bar and could be moved to just before its use)

The text was updated successfully, but these errors were encountered:

narendasan · 2022-08-15T14:41:59Z

Shouldn’t OpSupported report back that the node can be evaluated?

TensorRT/core/conversion/conversion.cpp

Line 22 in 5fa38f4

bool OpSupported(const torch::jit::Node* n) {

narendasan · 2022-08-15T14:43:36Z

#1263 This pr aims to log every state change in target executor for nodes, might help clear up what is happening

peri044 · 2022-08-15T15:25:45Z

Shouldn’t OpSupported report back that the node can be evaluated?

TensorRT/core/conversion/conversion.cpp

Line 22 in 5fa38f4

bool OpSupported(const torch::jit::Node* n) {

Yes. But the request here is to remove these nodes (make it a constant list which is passed to reshape node directly) altogether in the torchscript graph before partitioning phase so that ListConstruct won't obstruct partitioning and create additional segments.

%28 : int = aten::size(%22, %27)
%37 : int[] = prim::ListConstruct(%28, %34, %34, %35)

narendasan · 2022-08-15T15:32:53Z

Is this because of the input signature changes?

mfeliz-cruise · 2022-08-15T19:12:25Z

My thinking here is that when an op is supported with an Evaluator, it is resolvable to a constant in TRT. If these ops are resolvable to constants in TRT they should also be resolvable to constants in TorchScript. If we implemented an evaluator pass which ran before partitioning and evaluated any evaluatable node (ex. aten::size with static shaped input) we could replace those nodes with constants in the graph. We could then rerun jit's constant propagation potentially simplifying the graph further.

I think this has a few benefits:

Simplifies the graph before partitioning, avoids cases like the example in the original post where an evaluatable op is forced to fallback and breaks up a TRT segment
Allows testing of evaluators independently of partitioning and conversion
Improved debug, makes it clear and visible which ops are assumed to be constant
May simplify torch fallback regions
Allows for support for the same op with either a converter or evaluator (ex. aten::size). If the evaluator is contextually able to resolve a node to a constant (ex static shape aten::size) it will not be present in the graph at partitioning/conversion time. If it is, it will need to be implemented in fallback or through a converter which can handle dynamic shapes.

narendasan · 2022-08-15T21:38:40Z

My thinking here is that when an op is supported with an Evaluator, it is resolvable to a constant in TRT. If these ops are resolvable to constants in TRT they should also be resolvable to constants in TorchScript. If we implemented an evaluator pass which ran before partitioning and evaluated any evaluatable node (ex. aten::size with static shaped input) we could replace those nodes with constants in the graph. We could then rerun jit's constant propagation potentially simplifying the graph further.

I think this has a few benefits:
1. Simplifies the graph before partitioning, avoids cases like the example in the original post where an evaluatable op is forced to fallback and breaks up a TRT segment

2. Allows testing of evaluators independently of partitioning and conversion

3. Improved debug, makes it clear and visible which ops are assumed to be constant

4. May simplify torch fallback regions

5. Allows for support for the same op with either a converter or evaluator (ex. aten::size). If the evaluator is contextually able to resolve a node to a constant (ex static shape aten::size) it will not be present in the graph at partitioning/conversion time. If it is, it will need to be implemented in fallback or through a converter which can handle dynamic shapes.

This is true in some cases but not all. For example a prim::ListConstruct could take a number of tensors which are the result of preceding TRT converted nodes. In this case the nvinfer1::ITensors must exist to create the list.

Therefore, such a preprocessing system would need to be able to distinguish constant nodes using some characteristic other than node kind. All that is guaranteed by the torchscript IR is that once a value is produced it will never change but that is conditioned on one or many preceding ops being executed which may be dependencies of this value not all of which are going to be guaranteed to be evaluatable. I would think you could potentially evaluate to constants any nodes where you can trace-back to solely constants but I think that's what constant-prop/pool does. There is the slight detail that we have shape information, which could fold a few more ops, but we should then seek to fork the constant-prop/pool passes to take this into account vs other approaches I would think.

2. We do have an evaluator test rig, but not tests for all evaluators

3. I think I need some more detail here

5. We currently support per schema specialization, e.g. aten::add has many schemas, some are evaluators some are converters. But yes we cannot currently handle having the same schema registered in both places and picking between them. #1130 could help enable this.

mfeliz-cruise · 2022-08-15T22:21:35Z

Thanks @narendasan. are there more examples beyond prim::ListConstruct and prim::ListUnpack? I agree these do not fit the resolve-to-constant pattern I'm trying to address here. Could these be considered as converters?

Therefore, such a preprocessing system would need to be able to distinguish constant nodes using some characteristic other than node kind. All that is guaranteed by the torchscript IR is that once a value is produced it will never change but that is conditioned on one or many preceding ops being executed which may be dependencies of this value not all of which are going to be guaranteed to be evaluatable. I would think you could potentially evaluate to constants any nodes where you can trace-back to solely constants but I think that's what constant-prop/pool does. There is the slight detail that we have shape information, which could fold a few more ops, but we should then seek to fork the constant-prop/pool passes to take this into account vs other approaches I would think.

This addresses one of my concerns. I'm worried about cases where evaluators may resolve a value incorrectly to a constant or error out because the input is not constant. Ex. aten::mul.int of a value from an aten::size with dynamic input shapes should not be resolved to a constant. I think the case where we can resolve an aten::mul.int to a constant that is still in the Torch-TensorRT input after constant propagation is actually fairly rare and would only come up because of constants that we can identify that torchscript cannot (ex. size with static input shapes). If we removed these kinds of evaluators and instead relied on constant propagation after resolving known-to-torch-trt constants (size with static input shapes, dtype) then I think they would be automatically removed by constant propagation in all the cases where it would be valid for torch-trt to do the same. In the remaining cases the aten::mul would still be in the graph and could fallback correctly or be handled by a future converter that could correctly capture the non-constant input.

I think based on your input this is what I'm proposing:

Add a jit pass which explicitly resolves aten::size and similar ops that torch-tensorrt can prove are constants based on input specification (ideally this would also be able to propagate in cases where individual values in a shape are constant ex. dynamic batch with other dimensions static). Rerun jit constant propagation to eliminate any ops which now have constant inputs.
Remove most evaluators that today resolve to constant values (if they are not resolvable by constant propagation, assume inputs are non-constant and we must fallback or support with converters). ListConstruct, ListUnpack, Constant and small number of special-cases would remain.

This would allow us to explicitly call out what torch-trt is assuming is constant in the initial jit pass and give us a clear torchscript dump after constant propagation of what is still in the graph that will need to be converted or otherwise handled in partitioning. It removes the risk of evaluators incorrectly resolving a value to a constant, creates a path to preserve these ops in fallback regions without erroring out and it allows for the possibility of adding new converter support for nodes that remain in the graph with non-constant inputs (ex. aten::size outputing a shape-tensor which could be consumed by a dynamic reshape).

narendasan · 2022-08-16T04:13:53Z

Thanks @narendasan. are there more examples beyond prim::ListConstruct and prim::ListUnpack? I agree these do not fit the resolve-to-constant pattern I'm trying to address here. Could these be considered as converters?

prim::TupleConstruct/Unpack, aten::append, aten::size, aten::clone, aten::copy are all evaluators which are designed to handle ITensors i.e. potentially non constant data. They are not converters since converters are designed to purely emit ITensors. Evaluators produce IValues so types such as int, double, Tensor, Tuple, List (or wrapped ITensors). Some of these could be converters but they handle more than just ITensors. We have in the past discussed unifying Evaluators and Converters to all emit IValues but it did not seem like it added to much other than reducing the number of registries we have. Maybe we will do it some day but I don't think it helps too much here.

Re. 1. I think this sounds reasonable although it not sure how many cases it will hit, like you as point out as soon as there is dynamic shape we are on equal footing to TorchScript lowering information wise (give or take a few dimensions which are static across the input range). Regardless for the effort required I don't see any harm attempting to implement such a pass. The MVP I have in my head is really just inplace replacing directly known (i.e. what we get from the input specs) aten::size, aten::shape with prim::Constant tuples, or ints then seeing what constant pool/prop does. But as soon as we need to go through intermediate ops and start executing subgraphs ourselves I think that opens a can of worms I am not convinced is worth it yet.

Re. 2. I don't really see why we need to remove evaluators, the combo of injecting more shape information into the graph then running constant pooling won't necessarily resolve all ops that would be handled by evaluators. My understanding of the fundamental issue here after talking to @peri044 was that data like shapes which Torch-TensorTRT knows but TorchScript lowering doesn't is not taken into account. Seems like 1. solves that on its own. If evaluators just don't get called anymore I guess that's a plus but I don't really expect that to be the case.

Also from a while back we were chatting with the TS guys when we were trying to start implementing shape analysis and they said they were looking at baking shape info into the graph themselves, I don't know what came of that or if it was moved to a new project like functorch or fx but it would be worth taking a look at what they have these days that we can leverage.

narendasan · 2022-08-17T05:04:26Z

Seems like tracing is now including shape information:

graph(%self : __torch__.GatherNDListModel,
      %x : Float(3, 4, 5, strides=[20, 5, 1], requires_grad=0, device=cuda:0)):
  %4 : Long(2, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value= 1  0 [ CPULongType{2} ]]() # 1274.py:19:0
  %5 : Device = prim::Constant[value="cuda:0"]() # 1274.py:19:0
  %6 : int = prim::Constant[value=4]() # 1274.py:19:0
  %7 : bool = prim::Constant[value=0]() # 1274.py:19:0
  %8 : bool = prim::Constant[value=0]() # 1274.py:19:0
  %9 : NoneType = prim::Constant()
  %10 : Long(2, strides=[1], requires_grad=0, device=cuda:0) = aten::to(%4, %5, %6, %7, %8, %9) # 1274.py:19:0
  %11 : int = prim::Constant[value=1]() # 1274.py:19:0
  %12 : int = prim::Constant[value=0]() # 1274.py:19:0
  %13 : int = prim::Constant[value=9223372036854775807]() # 1274.py:19:0
  %14 : int = prim::Constant[value=1]() # 1274.py:19:0
  %15 : Float(3, 4, 5, strides=[20, 5, 1], requires_grad=0, device=cuda:0) = aten::slice(%x, %11, %12, %13, %14) # 1274.py:19:0
  %16 : Long(2, strides=[1], requires_grad=0, device=cpu) = prim::Constant[value= 1  0 [ CPULongType{2} ]]() # 1274.py:19:0
  %17 : Device = prim::Constant[value="cuda:0"]() # 1274.py:19:0
  %18 : int = prim::Constant[value=4]() # 1274.py:19:0
  %19 : bool = prim::Constant[value=0]() # 1274.py:19:0
  %20 : bool = prim::Constant[value=0]() # 1274.py:19:0
  %21 : NoneType = prim::Constant()
  %22 : Long(2, strides=[1], requires_grad=0, device=cuda:0) = aten::to(%16, %17, %18, %19, %20, %21) # 1274.py:19:0
  %23 : NoneType = prim::Constant()
  %24 : Tensor?[] = prim::ListConstruct(%10, %23, %22)
  %25 : Float(2, 4, strides=[4, 1], requires_grad=0, device=cuda:0) = aten::index(%15, %24) # 1274.py:19:0
  return (%25)

mfeliz-cruise · 2022-08-17T17:34:02Z

Would we hit issues with dynamic shapes if we rely on the traced shape values?

narendasan · 2022-09-02T18:36:05Z

Potentially, also I think this would force people to use trace. We need to investigate this further.

peri044 added the feature request New feature or request label Aug 15, 2022

peri044 assigned narendasan, peri044 and bowang007 Aug 15, 2022

ncomly-nvidia added the component: core Issues re: The core compiler label Aug 15, 2022

mfeliz-cruise mentioned this issue Aug 23, 2022

[feat] Add dependency awareness to torch-trt partitioning #1304

Merged

7 tasks

peri044 closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨[Feature] Perform constant folding of operations in the graph if nodes can be evaluated at compile time #1266

✨[Feature] Perform constant folding of operations in the graph if nodes can be evaluated at compile time #1266

peri044 commented Aug 15, 2022

narendasan commented Aug 15, 2022

narendasan commented Aug 15, 2022

peri044 commented Aug 15, 2022 •

edited

Loading

narendasan commented Aug 15, 2022

mfeliz-cruise commented Aug 15, 2022 •

edited

Loading

narendasan commented Aug 15, 2022 •

edited

Loading

mfeliz-cruise commented Aug 15, 2022 •

edited

Loading

narendasan commented Aug 16, 2022 •

edited

Loading

narendasan commented Aug 17, 2022

mfeliz-cruise commented Aug 17, 2022

narendasan commented Sep 2, 2022

✨[Feature] Perform constant folding of operations in the graph if nodes can be evaluated at compile time #1266

✨[Feature] Perform constant folding of operations in the graph if nodes can be evaluated at compile time #1266

Comments

peri044 commented Aug 15, 2022

narendasan commented Aug 15, 2022

narendasan commented Aug 15, 2022

peri044 commented Aug 15, 2022 • edited Loading

narendasan commented Aug 15, 2022

mfeliz-cruise commented Aug 15, 2022 • edited Loading

narendasan commented Aug 15, 2022 • edited Loading

mfeliz-cruise commented Aug 15, 2022 • edited Loading

narendasan commented Aug 16, 2022 • edited Loading

narendasan commented Aug 17, 2022

mfeliz-cruise commented Aug 17, 2022

narendasan commented Sep 2, 2022

peri044 commented Aug 15, 2022 •

edited

Loading

mfeliz-cruise commented Aug 15, 2022 •

edited

Loading

narendasan commented Aug 15, 2022 •

edited

Loading

mfeliz-cruise commented Aug 15, 2022 •

edited

Loading

narendasan commented Aug 16, 2022 •

edited

Loading