Skip to content

Iteratively run SimplificationPipeline until code optimization converges #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

Vremold
Copy link
Collaborator

@Vremold Vremold commented Aug 22, 2022

This PR is proposed to solve the problem below.

func.func @torch_module.forward(%arg0: !torch.nn.Module<"torch_module">, %arg1: !torch.tensor) -> !torch.tensor {
  ...
  %str = torch.constant.str "trunc"
  %548 = torch.tensor.literal(dense<5> : tensor<si64>) : !torch.tensor<[],si64>
  %549 = torch.aten.size.int %arg1, %int0 : !torch.tensor, !torch.int -> !torch.int
  %550 = torch.prim.NumToTensor.Scalar %549 : !torch.int -> !torch.tensor
  %551 = torch.aten.div.Tensor_mode %550, %548, %str : !torch.tensor, !torch.tensor<[],si64>, !torch.str -> !torch.tensor
  %552 = torch.aten.Int.Tensor %551 : !torch.tensor -> !torch.int
  %553 = torch.prim.ListConstruct %552, %int5, %int-1 : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
  %554 = torch.aten.view %arg1, %553 : !torch.tensor, !torch.list<int> -> !torch.tensor
  ...
}

In the above case, %arg1 has a constant size (5) along the 0-th dimension.It can be deduced that the shape of %554 should be completely static, but the actual output is [?,5,?]. The cause is that before shape refinement pipeline runs, the %551 is not known to be a constant. That is further caused by the canonicalization of AtenDivTensorModeOp fails as it relys on the shape refinement pipeline itself to deduce that one of its operand, %550, is a constant.
So the buggy IR pattern is like this:

  1. Shape refinement pipeline reasons out some constants;
  2. Based on the constants, we can further canonicalize some ops;
  3. The canonicalization produces more constants, which are related to the shape of some tensors;
  4. ...

By executing canonicalizer pass and shapeRefinementPipeline multiple times, this PR can help to alleviate this problem.

@Vremold Vremold requested review from silvasean and ramiro050 August 22, 2022 07:33
@silvasean
Copy link
Contributor

createTorchSimplificationPipeline is already running iteratively inside LowerToBackendContractPass after #1165 Is this still an issue?

// inference), because Torch type promotion rules actually depend on the shape
// of the operand.
createTorchShapeRefinementPipeline(pm);
for (int i = 0; i < options.maxIterations; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxIterations controls the iterations in LowerToBackendContractPass and should not be used this way.

Copy link
Collaborator Author

@Vremold Vremold Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createTorchSimplificationPipeline is already running iteratively inside LowerToBackendContractPass after #1165 Is this still an issue?

Yes. In my local development environment, ShapeRefinementPipeline only run once because satisfiesBackendContract returns true after one iteration. The output IR of the pipeline is something like this:

// -----// IR Dump After DropShapeCalculations (torch-drop-shape-calculations) //----- //
...
%5 = torch.vtensor.literal(dense<5> : tensor<si64>) : !torch.vtensor<[],si64>
%210 = torch.prim.NumToTensor.Scalar %int5 : !torch.int -> !torch.vtensor<[],unk>
%211 = torch.aten.div.Tensor_mode %210, %5, %str_4 : !torch.vtensor<[],unk>, !torch.vtensor<[],si64>, !torch.str -> !torch.vtensor<[],unk>
%212 = torch.aten.Int.Tensor %211 : !torch.vtensor<[],unk> -> !torch.int
%213 = torch.prim.ListConstruct %212, %int5, %int-1 : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
%214 = torch.aten.view %209, %213 : !torch.vtensor<[5,2048],unk>, !torch.list<int> -> !torch.vtensor<[?,5,?],unk>
...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is related to RefineTypes. Shall we add PrimNumToTensorScalar to

if (isa<CopyToValueTensorOp, CopyToNonValueTensorOp, AtenBatchNormOp,
?

Copy link
Collaborator Author

@Vremold Vremold Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this issue is only about constant shape. Could you explain more about the RefineTypes's influence on this problem? In my experiments, the type refinement for PrimNumToTensorScalarOp will fail if we do so.

// -----// IR Dump After RefineTypes (torch-refine-types) //----- //
%210 = torch.prim.NumToTensor.Scalar %int5 : !torch.int -> !torch.vtensor<[],unk>

@Vremold Vremold force-pushed the loop-shape-refinement branch from 1adbd37 to edbe3ac Compare August 23, 2022 06:26
@ZihengJiang
Copy link
Collaborator

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

@silvasean
Copy link
Contributor

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

Good point. I would recommend that we run until satisfiedBackendContract is true, and then keep iterating until no changes are made to the program (and of course not iterating more than maxIterations times). This can be done by hashing the module. I think OperationEquivalence can help with this: https://mlir.llvm.org/doxygen/structmlir_1_1OperationEquivalence.html

@Vremold Vremold force-pushed the loop-shape-refinement branch from edbe3ac to 0d06a56 Compare August 24, 2022 07:12
@Vremold Vremold changed the title Execute ShapeRefinementPipeline and canonicalizer multiple times to reason out more static shapes Iteratively run SimplificationPipeline until code optimization converges Aug 24, 2022
@Vremold
Copy link
Collaborator Author

Vremold commented Aug 24, 2022

@silvasean The problem might be satisfiesBackendContract returns true after one iteration, but this doesn't mean that the code is optimized to the converged status, which means we can still utilize ShapeRefinement and Canonicalization to infer some information afterwards.

Good point. I would recommend that we run until satisfiedBackendContract is true, and then keep iterating until no changes are made to the program (and of course not iterating more than maxIterations times). This can be done by hashing the module. I think OperationEquivalence can help with this: https://mlir.llvm.org/doxygen/structmlir_1_1OperationEquivalence.html

Thanks for the suggestions. I have reimplemented this PR and add a new condition to control the iterative execution of SimplificationPipeline, ensuring that code optimization converges, i.e., no code change happens.

@Vremold Vremold force-pushed the loop-shape-refinement branch from 0d06a56 to 4f30ece Compare August 24, 2022 07:41
@Vremold
Copy link
Collaborator Author

Vremold commented Aug 24, 2022

This approach also introduces problems. We need at least one additional SimplificationPipeline execution to make sure that no code change happened. This will result in at least one refine-types pass running after decompose-complex-ops pass of last iteration (e.g. decompose-complex-ops pass of iteration 1 => ... => refine-types pass of iteration 2). But decompose-complex-ops pass probably introduces some ops like torch.aten.empty.memory_format, these ops not supported in refine-types pass and will finally cause a compilation failure.

torch-mlir-opt: /root/share/up/torch-mlir/lib/Dialect/Torch/Transforms/RefineTypes.cpp:1118: void (anonymous namespace)::TypeAnalysis::incorporateKnowledge(mlir::Value, const (anonymous namespace)::ValueKnowledge &): Assertion `updatedKnowledge.has_value() && "IR has contradictory type!"' failed.
...
 #9 0x00000000005bb8ee (anonymous namespace)::TypeAnalysis::visitOperation(mlir::Operation*, llvm::ArrayRef<mlir::dataflow::Lattice<(anonymous namespace)::ValueKnowledge> const*>, llvm::ArrayRef<mlir::dataflow::Lattice<(anonymous namespace)::ValueKnowledge>*>) RefineTypes.cpp:0:0
#10 0x0000000001a64afc mlir::dataflow::AbstractSparseDataFlowAnalysis::visitOperation(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64afc)
#11 0x0000000001a64569 mlir::dataflow::AbstractSparseDataFlowAnalysis::initializeRecursively(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64569)
#12 0x0000000001a64675 mlir::dataflow::AbstractSparseDataFlowAnalysis::initializeRecursively(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a64675)
#13 0x0000000001a644c8 mlir::dataflow::AbstractSparseDataFlowAnalysis::initialize(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a644c8)
#14 0x0000000001a54f52 mlir::DataFlowSolver::initializeAndRun(mlir::Operation*) (/root/share/up/torch-mlir/build/bin/torch-mlir-opt+0x1a54f52)
#15 0x00000000005baa26 (anonymous namespace)::RefineTypesPass::runOnOperation() RefineTypes.cpp:0:0
...
[1]    1836385 abort (core dumped)  torch-mlir-opt -pass-pipeline='torchscript-module-to-torch-backend-pipeline'

This problem is already found in the CI failure. I'm not sure if there are other cases where it fails.
Considering that IR optimization mainly happens in ShapeRefinementPipeline and canonicalizer, perhaps a better solution is that we iteratively run these two passes instead of the entire SimplificationPipeline if we want more optimization. This will have minimal impact on the order in which the passes are run.

@Vremold Vremold requested a review from silvasean August 24, 2022 09:30
static llvm::hash_code hashOperation(Operation *op) {
llvm::hash_code hash(0);
llvm::hash_code opHash = OperationEquivalence::computeHash(
op, OperationEquivalence::ignoreHashValue,
Copy link
Contributor

@silvasean silvasean Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add /*arg=*/ comments here so we know the arg names?

Also, I think we should hash the types of the results so that we keep iterating if the types get refined.

static llvm::hash_code hashBlock(Block &block) {
llvm::hash_code hash(0);
for (Operation &op : block.getOperations()) {
llvm::hash_code opHash = hashOperation(&op);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should hash the block arguments too (at least their types)

@@ -234,7 +271,14 @@ class LowerToBackendContractPass

if (failed(runPipeline(pm, module)))
return signalPassFailure();
} while (!satisfiesBackendContract(module));

llvm::hash_code newModuleHash = hashOperation(module);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a unit test of the hashing logic?

Copy link
Contributor

@silvasean silvasean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround!

RefineTypes should support torch.aten.empty.memory_format -- that bug you are running into appears to be a legitimate bug -- can you file an isolated test case for RefineTypes?

I agree that after the backend contract is satisfied, only ShapeRefinementPipeline and canonicalizer should be run. That will be more efficient too. I would recommend adding a new pipeline createTorchBackendContractCleanupPipeline with ShapeRefinementPipeline and canonicalizer. The documentation for the new pipeline would be "Clean up and optimize IR that satisfies the backend contract".

@Vremold
Copy link
Collaborator Author

Vremold commented Aug 26, 2022

Recently, I'm thinking of the execution order of the passes in TorchSimplificationPipeline. Their are two main requirements:

  1. To satisfy backend contract, we need to run the whole TorchSimplificationPipeline iteratively. But it will probably cause problems like in PR Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280 . Will it be better to execute decompose-complex-ops pass before shape or type refinement passes? And it also sounds more natural if we decompose complex ops before performing the simplifications.
  2. To perform more code optimization, we need to run shape-refinement-pipeline and canonicalizer multiple times. If we want minimal effects on the execution order of passes (such as avoiding running refine-types after decompose-complex-ops), I think a better solution is that we create another pass which iteratively run shape-refinement-pipeline and canonicalizer until no code change happens or maxIterations is reached. The new pass will be inserted into the current TorchSimplificationPipeline and no more code changes are needed.
    I'm not sure if my idea is correct and meets for the demands. Thanks for any comments ahead.

@silvasean
Copy link
Contributor

silvasean commented Aug 26, 2022

  1. To satisfy backend contract, we need to run the whole TorchSimplificationPipeline iteratively. But it will probably cause problems like in PR Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280 . Will it be better to execute decompose-complex-ops pass before shape or type refinement passes? And it also sounds more natural if we decompose complex ops before performing the simplifications.

The order of the SimplificationPipeline is chosen so that it almost always finishes in one iteration. Changing the order like this will break that.

2. To perform more code optimization, we need to run shape-refinement-pipeline and canonicalizer multiple times. If we want minimal effects on the execution order of passes (such as avoiding running refine-types after decompose-complex-ops), I think a better solution is that we create another pass which iteratively run shape-refinement-pipeline and canonicalizer until no code change happens or maxIterations is reached. The new pass will be inserted into the current TorchSimplificationPipeline and no more code changes are needed.

I support adding a pass that iteratively runs shape refinement and canonicalizer. But that pass should not run inside TorchSimplificationPipeline. We should clearly separate 1. Lowering to the backend contract 2. Optimizing code once it satisfies the backend contract. The TorchSimplificationPipeline is part of 1. but should not be used as part of 2. (maybe we need a better name than TorchSimplificationPipeline)

@Vremold Vremold marked this pull request as draft September 6, 2022 03:01
@Vremold Vremold closed this Sep 20, 2022
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this pull request Oct 3, 2022
…ums (llvm#1261)

* [maccel]: Change --maccel option from a string option to a list of enums

Signed-off-by: Ettore Tiotto <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>

* - add queryEntryPoints Java API (llvm#1255)

- use GENERATE_NATIVE_HEADERS option of add_jar (require
  cmake 3.11+) to generate JNI header since javah was
  deprecated since Java 8

Signed-off-by: Gong Su <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>

* Do not set ownership for an output OMTensor that is also a block argument (llvm#1256)

* Do not set ownership for an output that is also a block argument

Signed-off-by: Tung D. Le <[email protected]>

* Edit lit tests

Signed-off-by: Tung D. Le <[email protected]>

* More name changes

Signed-off-by: Tung D. Le <[email protected]>

* Edit comments

Signed-off-by: Tung D. Le <[email protected]>

* typos

Signed-off-by: Tung D. Le <[email protected]>

* Make the llvm.ident lit test more meaningful (llvm#1260)

* Make the llvm.ident lit test more meaningful

Update the test to specifically look for a commit hash instead of any characters

Signed-off-by: Stella Stamenova <[email protected]>

* Account for .git suffix

Signed-off-by: Stella Stamenova <[email protected]>

Co-authored-by: Tung D. Le <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>

* [backend_cpp]: Use ModelLib to create CategoryMapper cpp tests.

Signed-off-by: Ettore Tiotto <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>

* Revert "[backend_cpp]: Use ModelLib to create CategoryMapper cpp tests."

This reverts commit 00e8a6bdd6d90c6125326173340fd3e00f9c838c.

Signed-off-by: Tung D. Le <[email protected]>

* [Accelerator] Do not use NNPA preprocessor to avoid exposing accelerator code (llvm#1263)

* Do not use NNPA preprocessor to avoid exposing accelerator code

Signed-off-by: Tung D. Le <[email protected]>

* clang-format

Signed-off-by: Tung D. Le <[email protected]>

* Move OptimizationLevel to the common place

Signed-off-by: Tung D. Le <[email protected]>

* Rename functions

Signed-off-by: Tung D. Le <[email protected]>

* format

Signed-off-by: Tung D. Le <[email protected]>

* Address comments

Signed-off-by: Tung D. Le <[email protected]>

* generate Accelerator option enum from CMake

Signed-off-by: Kevin O'Brien <[email protected]>
Signed-off-by: Tung D. Le <[email protected]>

* Edit CMakeLists.txt

Signed-off-by: Tung D. Le <[email protected]>

* clang-format

Signed-off-by: Tung D. Le <[email protected]>

Co-authored-by: gongsu832 <[email protected]>
Co-authored-by: Tung D. Le <[email protected]>
Co-authored-by: Stella Stamenova <[email protected]>
Co-authored-by: Kevin O'Brien <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants