Iteratively run SimplificationPipeline until code optimization converges #1261

Vremold · 2022-08-22T07:32:29Z

This PR is proposed to solve the problem below.

func.func @torch_module.forward(%arg0: !torch.nn.Module<"torch_module">, %arg1: !torch.tensor) -> !torch.tensor {
  ...
  %str = torch.constant.str "trunc"
  %548 = torch.tensor.literal(dense<5> : tensor<si64>) : !torch.tensor<[],si64>
  %549 = torch.aten.size.int %arg1, %int0 : !torch.tensor, !torch.int -> !torch.int
  %550 = torch.prim.NumToTensor.Scalar %549 : !torch.int -> !torch.tensor
  %551 = torch.aten.div.Tensor_mode %550, %548, %str : !torch.tensor, !torch.tensor<[],si64>, !torch.str -> !torch.tensor
  %552 = torch.aten.Int.Tensor %551 : !torch.tensor -> !torch.int
  %553 = torch.prim.ListConstruct %552, %int5, %int-1 : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
  %554 = torch.aten.view %arg1, %553 : !torch.tensor, !torch.list<int> -> !torch.tensor
  ...
}

In the above case, %arg1 has a constant size (5) along the 0-th dimension.It can be deduced that the shape of %554 should be completely static, but the actual output is [?,5,?]. The cause is that before shape refinement pipeline runs, the %551 is not known to be a constant. That is further caused by the canonicalization of AtenDivTensorModeOp fails as it relys on the shape refinement pipeline itself to deduce that one of its operand, %550, is a constant.
So the buggy IR pattern is like this:

Shape refinement pipeline reasons out some constants;
Based on the constants, we can further canonicalize some ops;
The canonicalization produces more constants, which are related to the shape of some tensors;
...

By executing canonicalizer pass and shapeRefinementPipeline multiple times, this PR can help to alleviate this problem.

silvasean · 2022-08-22T21:33:16Z

createTorchSimplificationPipeline is already running iteratively inside LowerToBackendContractPass after #1165 Is this still an issue?

silvasean · 2022-08-22T21:33:40Z

lib/Dialect/Torch/Transforms/Passes.cpp

-  // inference), because Torch type promotion rules actually depend on the shape
-  // of the operand.
-  createTorchShapeRefinementPipeline(pm);
+  for (int i = 0; i < options.maxIterations; ++i) {


maxIterations controls the iterations in LowerToBackendContractPass and should not be used this way.

createTorchSimplificationPipeline is already running iteratively inside LowerToBackendContractPass after #1165 Is this still an issue?

Yes. In my local development environment, ShapeRefinementPipeline only run once because satisfiesBackendContract returns true after one iteration. The output IR of the pipeline is something like this:

// -----// IR Dump After DropShapeCalculations (torch-drop-shape-calculations) //----- // ... %5 = torch.vtensor.literal(dense<5> : tensor<si64>) : !torch.vtensor<[],si64> %210 = torch.prim.NumToTensor.Scalar %int5 : !torch.int -> !torch.vtensor<[],unk> %211 = torch.aten.div.Tensor_mode %210, %5, %str_4 : !torch.vtensor<[],unk>, !torch.vtensor<[],si64>, !torch.str -> !torch.vtensor<[],unk> %212 = torch.aten.Int.Tensor %211 : !torch.vtensor<[],unk> -> !torch.int %213 = torch.prim.ListConstruct %212, %int5, %int-1 : (!torch.int, !torch.int, !torch.int) -> !torch.list<int> %214 = torch.aten.view %209, %213 : !torch.vtensor<[5,2048],unk>, !torch.list<int> -> !torch.vtensor<[?,5,?],unk> ...

I think the issue is related to RefineTypes. Shall we add PrimNumToTensorScalar to

torch-mlir/lib/Dialect/Torch/Transforms/RefineTypes.cpp

Line 638 in bb47c16

if (isa<CopyToValueTensorOp, CopyToNonValueTensorOp, AtenBatchNormOp,

?

I think this issue is only about constant shape. Could you explain more about the RefineTypes's influence on this problem? In my experiments, the type refinement for PrimNumToTensorScalarOp will fail if we do so.

// -----// IR Dump After RefineTypes (torch-refine-types) //----- // %210 = torch.prim.NumToTensor.Scalar %int5 : !torch.int -> !torch.vtensor<[],unk>

ZihengJiang · 2022-08-23T23:09:11Z

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

silvasean · 2022-08-23T23:20:38Z

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

Good point. I would recommend that we run until satisfiedBackendContract is true, and then keep iterating until no changes are made to the program (and of course not iterating more than maxIterations times). This can be done by hashing the module. I think OperationEquivalence can help with this: https://mlir.llvm.org/doxygen/structmlir_1_1OperationEquivalence.html

Vremold · 2022-08-24T07:36:12Z

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

Good point. I would recommend that we run until satisfiedBackendContract is true, and then keep iterating until no changes are made to the program (and of course not iterating more than maxIterations times). This can be done by hashing the module. I think OperationEquivalence can help with this: https://mlir.llvm.org/doxygen/structmlir_1_1OperationEquivalence.html

Thanks for the suggestions. I have reimplemented this PR and add a new condition to control the iterative execution of SimplificationPipeline, ensuring that code optimization converges, i.e., no code change happens.

Vremold · 2022-08-24T09:24:22Z

This approach also introduces problems. We need at least one additional SimplificationPipeline execution to make sure that no code change happened. This will result in at least one refine-types pass running after decompose-complex-ops pass of last iteration (e.g. decompose-complex-ops pass of iteration 1 => ... => refine-types pass of iteration 2). But decompose-complex-ops pass probably introduces some ops like torch.aten.empty.memory_format, these ops not supported in refine-types pass and will finally cause a compilation failure.

torch-mlir-opt: /root/share/up/torch-mlir/lib/Dialect/Torch/Transforms/RefineTypes.cpp:1118: void (anonymous namespace)::TypeAnalysis::incorporateKnowledge(mlir::Value, const (anonymous namespace)::ValueKnowledge &): Assertion `updatedKnowledge.has_value() && "IR has contradictory type!"' failed.
...
 #9 0x00000000005bb8ee (anonymous namespace)::TypeAnalysis::visitOperation(mlir::Operation*, llvm::ArrayRef<mlir::dataflow::Lattice<(anonymous namespace)::ValueKnowledge> const*>, llvm::ArrayRef<mlir::dataflow::Lattice<(anonymous namespace)::ValueKnowledge>*>) RefineTypes.cpp:0:0
#10 0x0000000001a64afc mlir::dataflow::AbstractSparseDataFlowAnalysis::visitOperation(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64afc)
#11 0x0000000001a64569 mlir::dataflow::AbstractSparseDataFlowAnalysis::initializeRecursively(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64569)
#12 0x0000000001a64675 mlir::dataflow::AbstractSparseDataFlowAnalysis::initializeRecursively(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64675)
#13 0x0000000001a644c8 mlir::dataflow::AbstractSparseDataFlowAnalysis::initialize(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a644c8)
#14 0x0000000001a54f52 mlir::DataFlowSolver::initializeAndRun(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a54f52)
#15 0x00000000005baa26 (anonymous namespace)::RefineTypesPass::runOnOperation() RefineTypes.cpp:0:0
...
[1]    1836385 abort (core dumped)  torch-mlir-opt -pass-pipeline='torchscript-module-to-torch-backend-pipeline'

This problem is already found in the CI failure. I'm not sure if there are other cases where it fails.
Considering that IR optimization mainly happens in ShapeRefinementPipeline and canonicalizer, perhaps a better solution is that we iteratively run these two passes instead of the entire SimplificationPipeline if we want more optimization. This will have minimal impact on the order in which the passes are run.

silvasean · 2022-08-24T17:53:31Z

lib/Dialect/Torch/Transforms/LowerToBackendContract.cpp

+static llvm::hash_code hashOperation(Operation *op) {
+  llvm::hash_code hash(0);
+  llvm::hash_code opHash = OperationEquivalence::computeHash(
+    op, OperationEquivalence::ignoreHashValue,


can you add /*arg=*/ comments here so we know the arg names?

Also, I think we should hash the types of the results so that we keep iterating if the types get refined.

silvasean · 2022-08-24T17:55:48Z

lib/Dialect/Torch/Transforms/LowerToBackendContract.cpp

+static llvm::hash_code hashBlock(Block &block) {
+  llvm::hash_code hash(0);
+  for (Operation &op : block.getOperations()) {
+    llvm::hash_code opHash = hashOperation(&op);


we should hash the block arguments too (at least their types)

silvasean · 2022-08-24T17:57:05Z

lib/Dialect/Torch/Transforms/LowerToBackendContract.cpp

@@ -234,7 +271,14 @@ class LowerToBackendContractPass

      if (failed(runPipeline(pm, module)))
        return signalPassFailure();
-    } while (!satisfiesBackendContract(module));
+
+      llvm::hash_code newModuleHash = hashOperation(module);


can you add a unit test of the hashing logic?

silvasean

Thanks for the quick turnaround!

RefineTypes should support torch.aten.empty.memory_format -- that bug you are running into appears to be a legitimate bug -- can you file an isolated test case for RefineTypes?

I agree that after the backend contract is satisfied, only ShapeRefinementPipeline and canonicalizer should be run. That will be more efficient too. I would recommend adding a new pipeline createTorchBackendContractCleanupPipeline with ShapeRefinementPipeline and canonicalizer. The documentation for the new pipeline would be "Clean up and optimize IR that satisfies the backend contract".

Vremold · 2022-08-26T10:57:23Z

Recently, I'm thinking of the execution order of the passes in TorchSimplificationPipeline. Their are two main requirements:

To satisfy backend contract, we need to run the whole TorchSimplificationPipeline iteratively. But it will probably cause problems like in PR Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280 . Will it be better to execute decompose-complex-ops pass before shape or type refinement passes? And it also sounds more natural if we decompose complex ops before performing the simplifications.
To perform more code optimization, we need to run shape-refinement-pipeline and canonicalizer multiple times. If we want minimal effects on the execution order of passes (such as avoiding running refine-types after decompose-complex-ops), I think a better solution is that we create another pass which iteratively run shape-refinement-pipeline and canonicalizer until no code change happens or maxIterations is reached. The new pass will be inserted into the current TorchSimplificationPipeline and no more code changes are needed.
I'm not sure if my idea is correct and meets for the demands. Thanks for any comments ahead.

silvasean · 2022-08-26T19:30:02Z

To satisfy backend contract, we need to run the whole TorchSimplificationPipeline iteratively. But it will probably cause problems like in PR Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280 . Will it be better to execute decompose-complex-ops pass before shape or type refinement passes? And it also sounds more natural if we decompose complex ops before performing the simplifications.

The order of the SimplificationPipeline is chosen so that it almost always finishes in one iteration. Changing the order like this will break that.

2. To perform more code optimization, we need to run shape-refinement-pipeline and canonicalizer multiple times. If we want minimal effects on the execution order of passes (such as avoiding running refine-types after decompose-complex-ops), I think a better solution is that we create another pass which iteratively run shape-refinement-pipeline and canonicalizer until no code change happens or maxIterations is reached. The new pass will be inserted into the current TorchSimplificationPipeline and no more code changes are needed.

I support adding a pass that iteratively runs shape refinement and canonicalizer. But that pass should not run inside TorchSimplificationPipeline. We should clearly separate 1. Lowering to the backend contract 2. Optimizing code once it satisfies the backend contract. The TorchSimplificationPipeline is part of 1. but should not be used as part of 2. (maybe we need a better name than TorchSimplificationPipeline)

…ums (llvm#1261) * [maccel]: Change --maccel option from a string option to a list of enums Signed-off-by: Ettore Tiotto <[email protected]> Signed-off-by: Tung D. Le <[email protected]> * - add queryEntryPoints Java API (llvm#1255) - use GENERATE_NATIVE_HEADERS option of add_jar (require cmake 3.11+) to generate JNI header since javah was deprecated since Java 8 Signed-off-by: Gong Su <[email protected]> Signed-off-by: Tung D. Le <[email protected]> * Do not set ownership for an output OMTensor that is also a block argument (llvm#1256) * Do not set ownership for an output that is also a block argument Signed-off-by: Tung D. Le <[email protected]> * Edit lit tests Signed-off-by: Tung D. Le <[email protected]> * More name changes Signed-off-by: Tung D. Le <[email protected]> * Edit comments Signed-off-by: Tung D. Le <[email protected]> * typos Signed-off-by: Tung D. Le <[email protected]> * Make the llvm.ident lit test more meaningful (llvm#1260) * Make the llvm.ident lit test more meaningful Update the test to specifically look for a commit hash instead of any characters Signed-off-by: Stella Stamenova <[email protected]> * Account for .git suffix Signed-off-by: Stella Stamenova <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Signed-off-by: Tung D. Le <[email protected]> * [backend_cpp]: Use ModelLib to create CategoryMapper cpp tests. Signed-off-by: Ettore Tiotto <[email protected]> Signed-off-by: Tung D. Le <[email protected]> * Revert "[backend_cpp]: Use ModelLib to create CategoryMapper cpp tests." This reverts commit 00e8a6bdd6d90c6125326173340fd3e00f9c838c. Signed-off-by: Tung D. Le <[email protected]> * [Accelerator] Do not use NNPA preprocessor to avoid exposing accelerator code (llvm#1263) * Do not use NNPA preprocessor to avoid exposing accelerator code Signed-off-by: Tung D. Le <[email protected]> * clang-format Signed-off-by: Tung D. Le <[email protected]> * Move OptimizationLevel to the common place Signed-off-by: Tung D. Le <[email protected]> * Rename functions Signed-off-by: Tung D. Le <[email protected]> * format Signed-off-by: Tung D. Le <[email protected]> * Address comments Signed-off-by: Tung D. Le <[email protected]> * generate Accelerator option enum from CMake Signed-off-by: Kevin O'Brien <[email protected]> Signed-off-by: Tung D. Le <[email protected]> * Edit CMakeLists.txt Signed-off-by: Tung D. Le <[email protected]> * clang-format Signed-off-by: Tung D. Le <[email protected]> Co-authored-by: gongsu832 <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Co-authored-by: Stella Stamenova <[email protected]> Co-authored-by: Kevin O'Brien <[email protected]>

Vremold requested review from silvasean and ramiro050 August 22, 2022 07:33

silvasean reviewed Aug 22, 2022

View reviewed changes

Vremold force-pushed the loop-shape-refinement branch from 1adbd37 to edbe3ac Compare August 23, 2022 06:26

Vremold force-pushed the loop-shape-refinement branch from edbe3ac to 0d06a56 Compare August 24, 2022 07:12

Vremold changed the title ~~Execute ShapeRefinementPipeline and canonicalizer multiple times to reason out more static shapes~~ Iteratively run SimplificationPipeline until code optimization converges Aug 24, 2022

Iteratively run SimplificationPipeline until code optimization converges

4f30ece

Vremold force-pushed the loop-shape-refinement branch from 0d06a56 to 4f30ece Compare August 24, 2022 07:41

Vremold requested a review from silvasean August 24, 2022 09:30

silvasean reviewed Aug 24, 2022

View reviewed changes

Vremold mentioned this pull request Aug 26, 2022

Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280

Closed

Vremold mentioned this pull request Aug 31, 2022

Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Closed

Vremold marked this pull request as draft September 6, 2022 03:01

Vremold closed this Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iteratively run SimplificationPipeline until code optimization converges #1261

Iteratively run SimplificationPipeline until code optimization converges #1261

Vremold commented Aug 22, 2022 •

edited

Loading

silvasean commented Aug 22, 2022

silvasean Aug 22, 2022

Vremold Aug 23, 2022 •

edited

Loading

tanyokwok Aug 23, 2022

Vremold Aug 23, 2022 •

edited

Loading

ZihengJiang commented Aug 23, 2022

silvasean commented Aug 23, 2022

Vremold commented Aug 24, 2022 •

edited

Loading

Vremold commented Aug 24, 2022 •

edited

Loading

silvasean Aug 24, 2022 •

edited

Loading

silvasean Aug 24, 2022

silvasean Aug 24, 2022

silvasean left a comment

Vremold commented Aug 26, 2022 •

edited

Loading

silvasean commented Aug 26, 2022 •

edited

Loading

Iteratively run SimplificationPipeline until code optimization converges #1261

Iteratively run SimplificationPipeline until code optimization converges #1261

Conversation

Vremold commented Aug 22, 2022 • edited Loading

silvasean commented Aug 22, 2022

silvasean Aug 22, 2022

Choose a reason for hiding this comment

Vremold Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

tanyokwok Aug 23, 2022

Choose a reason for hiding this comment

Vremold Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

ZihengJiang commented Aug 23, 2022

silvasean commented Aug 23, 2022

Vremold commented Aug 24, 2022 • edited Loading

Vremold commented Aug 24, 2022 • edited Loading

silvasean Aug 24, 2022 • edited Loading

Choose a reason for hiding this comment

silvasean Aug 24, 2022

Choose a reason for hiding this comment

silvasean Aug 24, 2022

Choose a reason for hiding this comment

silvasean left a comment

Choose a reason for hiding this comment

Vremold commented Aug 26, 2022 • edited Loading

silvasean commented Aug 26, 2022 • edited Loading

Vremold commented Aug 22, 2022 •

edited

Loading

Vremold Aug 23, 2022 •

edited

Loading

Vremold Aug 23, 2022 •

edited

Loading

Vremold commented Aug 24, 2022 •

edited

Loading

Vremold commented Aug 24, 2022 •

edited

Loading

silvasean Aug 24, 2022 •

edited

Loading

Vremold commented Aug 26, 2022 •

edited

Loading

silvasean commented Aug 26, 2022 •

edited

Loading