Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Vremold · 2022-08-31T16:49:25Z

Recently, I'm working on iteratively executing TorchSimplificationPipeline in LowerBackendContract pass. The motivation of doing so mainly comes from two aspects:

SatifyBackendContract may needs multiple execution of TorchSimplificationPipeline. This is what it now looks like. But in my experiments, TorchSimplificationPipeline usually run only once, for the reason that SatisfyBackendContract has been met.
We need to run the pipeline multiple times to make sure the code is optimized into a converged state. An example of suboptimization is in the PR Iteratively run SimplificationPipeline until code optimization converges #1261. These optimizations are usually related to shape inference, and the main passes involved are canonicalizer, shape-simiplification-pipeline and refine-rypes.

However, if executing TorchSimplificationPipeline more than one times. The following problems will arise.

Dtype mismatch problems in memory-alloc ops. In the first iteration, we get the output tensor's type. If it mismatches the dtype param of the op (for example, dtype is torch.none), the refine-types of the next iteration will fail. An example can be seen in PR Fix a bug about torch-refine-types pass when the dtype of output tensor is known #1280. Thanks a lot for the suggestions of @ramiro050. It indeed solves part of the problems. However, the following e2e tests still fail, namely RandLikeModule_basic，FullLikeModuleInt2DStatic_basic，FullLikeModuleInt2D_basic and FullModuleInt3D_basic. That's because that the dtype of initial captured aten ops is torch.none. Take RandLikeModule_basic as example, the initial IR is:

%1 = torch.aten.rand_like %arg1, %none_0, %none_0, %none_0, %none_0, %none_0 : !torch.tensor, !torch.none, !torch.none, !torch.none, !torch.none, !torch.none -> !torch.tensor

In the decompose-complex-ops pass, AtenRollOp will be split into several AtenSliceTensorOps and AtenCatOp. Here is an example IR:

%3 = torch.aten.slice.Tensor %arg0, %int0, %2, %none, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.none, !torch.int -> !torch.vtensor<[?,1,2],f32>
%4 = torch.aten.slice.Tensor %arg0, %int0, %int0_0, %2, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[?,1,2],f32>
%5 = torch.prim.ListConstruct %3, %4 : (!torch.vtensor<[?,1,2],f32>, !torch.vtensor<[?,1,2],f32>) -> !torch.list<vtensor<[?,1,2],f32>>
%6 = torch.aten.cat %5, %int0 : !torch.list<vtensor<[?,1,2],f32>>, !torch.int -> !torch.vtensor<[3,1,2],f32>

If run TorchSimplificationPipeline second time, %3 and %4 will be inferred to have static shape and the verification of ListConstructOp will fail, because it finds out that its two operands have different shape.

%6 = "torch.aten.slice.Tensor"(%arg0, %4, %2, %3, %1) : (!torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.none, !torch.int) -> !torch.vtensor<[2,1,2],f32>
...
%7 = "torch.aten.slice.Tensor"(%arg0, %4, %4, %2, %1) : (!torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.vtensor<[1,1,2],f32>

All the problems are found in current e2e tests. Any idea to solve the above problems? At my point of view, the most straight solution is reorder and reorganization of the passes in TorchSimplificationPipeline

The text was updated successfully, but these errors were encountered:

ramiro050 · 2022-08-31T18:01:34Z

Thanks @Vremold for the nice summary of the issues! Iteratively running TorchSimplificationPipeline is uncovering some interesting bugs.

For 1., the issue with aten.rand_like is that one of the ops it gets decomposed into is aten.empty_like, which has a decomposition that is not entirely correct. Namely, the decomposition for aten.emtpy_like should check if op.dtype() is None, and if so, it should pass the dtype of the input tensor, rather than the op.dtype() argument, to the AtenMemoryFormatOp:

torch-mlir/lib/Dialect/Torch/Transforms/DecomposeComplexOps.cpp

Line 1705 in e52e886

op, op.getType(), sizeList, op.dtype(), op.layout(), op.device(),

The fix should be similar to the one you made for the createInitTensor helper function.

silvasean · 2022-08-31T20:56:53Z

For 2. shape inference should never create invalid IR. If it does then that is a bug. Can you make a reproducer? (a .mlir file and a torch-mlir-opt command line with the shape inference pass that creates the illegal op). I'm happy to take a look at this.

Reorganization of TorchSimplificationPipeline is not a good solution. It can only hide the problems. It does not fix them. It will always be possible to write a new program that hits these issues. These are all legitimate bugs in the relevant passes and should be individually fixed.

Vremold · 2022-09-01T02:30:57Z

For 2. shape inference should never create invalid IR. If it does then that is a bug. Can you make a reproducer? (a .mlir file and a torch-mlir-opt command line with the shape inference pass that creates the illegal op). I'm happy to take a look at this.

Reorganization of TorchSimplificationPipeline is not a good solution. It can only hide the problems. It does not fix them. It will always be possible to write a new program that hits these issues. These are all legitimate bugs in the relevant passes and should be individually fixed.

The original python script comes from RollModule_basic e2e test case. It looks like this:

class RollModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x.roll([2, -1], [0, 2])
x = torch.rand(3, 1, 2)
module = torch_mlir.compile(RollModule(), x, output_type=torch_mlir.OutputType.TORCH)

I recorded the log in the process.
IR before the first decompose-complex-ops:

// To reproduce this error, run the following commands sequentially:
// torch-mlir-opt -torch-decompose-complex-ops this.mlir
// torch-mlir-opt -pass-pipeline='torch-shape-refinement-pipeline' this.mlir
func.func @forward(%arg0: !torch.vtensor<[3,1,2],f32>) -> !torch.vtensor<[3,1,2],f32> {
  %int0 = torch.constant.int 0
  %int2 = torch.constant.int 2
  %int-1 = torch.constant.int -1
  %0 = torch.prim.ListConstruct %int2, %int-1 : (!torch.int, !torch.int) -> !torch.list<int>
  %1 = torch.prim.ListConstruct %int0, %int2 : (!torch.int, !torch.int) -> !torch.list<int>
  %2 = torch.aten.roll %arg0, %0, %1 : !torch.vtensor<[3,1,2],f32>, !torch.list<int>, !torch.list<int> -> !torch.vtensor<[3,1,2],f32>
  return %2 : !torch.vtensor<[3,1,2],f32>
}

IR after the first decompose-complex-ops:

...
%2 = torch.aten.neg.int %int2 : !torch.int -> !torch.int
%3 = torch.aten.slice.Tensor %arg0, %int0, %2, %none, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.none, !torch.int -> !torch.vtensor<[?,1,2],f32>
%4 = torch.aten.slice.Tensor %arg0, %int0, %int0_0, %2, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[?,1,2],f32>
%5 = torch.prim.ListConstruct %3, %4 : (!torch.vtensor<[?,1,2],f32>, !torch.vtensor<[?,1,2],f32>) -> !torch.list<vtensor<[?,1,2],f32>>
%6 = torch.aten.cat %5, %int0 : !torch.list<vtensor<[?,1,2],f32>>, !torch.int -> !torch.vtensor<[3,1,2],f32>
%7 = torch.aten.neg.int %int-1 : !torch.int -> !torch.int
%8 = torch.aten.slice.Tensor %6, %int2, %7, %none, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.none, !torch.int -> !torch.vtensor<[3,1,?],f32>
%9 = torch.aten.slice.Tensor %6, %int2, %int0_0, %7, %int1 : !torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.int, !torch.int -> !torch.vtensor<[3,1,?],f32>
%10 = torch.prim.ListConstruct %8, %9 : (!torch.vtensor<[3,1,?],f32>, !torch.vtensor<[3,1,?],f32>) -> !torch.list<vtensor<[3,1,?],f32>>
%11 = torch.aten.cat %10, %int2 : !torch.list<vtensor<[3,1,?],f32>>, !torch.int -> !torch.vtensor<[3,1,2],f32>
...

IR after the second torch-simplify-shape-calculations, this is where it fails:

%6 = "torch.shape.calculate"() ({
  %14 = "torch.aten.slice.Tensor"(%arg0, %4, %2, %3, %1) : (!torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.none, !torch.int) -> !torch.vtensor<[2,1,2],f32>
  "torch.shape.calculate.yield"(%14) : (!torch.vtensor<[2,1,2],f32>) -> ()
}, {
  %14 = "torch.prim.ListConstruct"(%5, %1, %5) : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
  "torch.shape.calculate.yield.shapes"(%14) : (!torch.list<int>) -> ()
}) : () -> !torch.vtensor<[2,1,2],f32>
%7 = "torch.shape.calculate"() ({
  %14 = "torch.aten.slice.Tensor"(%arg0, %4, %4, %2, %1) : (!torch.vtensor<[3,1,2],f32>, !torch.int, !torch.int, !torch.int, !torch.int) -> !torch.vtensor<[1,1,2],f32>
  "torch.shape.calculate.yield"(%14) : (!torch.vtensor<[1,1,2],f32>) -> ()
}, {
  %14 = "torch.prim.ListConstruct"(%1, %1, %5) : (!torch.int, !torch.int, !torch.int) -> !torch.list<int>
  "torch.shape.calculate.yield.shapes"(%14) : (!torch.list<int>) -> ()
}) : () -> !torch.vtensor<[1,1,2],f32>
%8 = "torch.prim.ListConstruct"(%6, %7) : (!torch.vtensor<[2,1,2],f32>, !torch.vtensor<[1,1,2],f32>) -> !torch.list<vtensor<[?,1,2],f32>>

The error message is:

raw.mlir:68:10: error: operand types should have the same type as the list contained type
    %3 = torch.aten.roll %arg1, %1, %2 : !torch.tensor, !torch.list<int>, !torch.list<int> -> !torch.tensor
         ^
raw.mlir:68:10: note: see current operation: %8 = "torch.prim.ListConstruct"(%6, %7) : (!torch.vtensor<[2,1,2],f32>, !torch.vtensor<[1,1,2],f32>) -> !torch.list<vtensor<[?,1,2],f32>>

I understand your concerns and I will follow the same line of thinking to deal with these issues later.

* - Build libzdnn.a with -fPIC and embed in model.so when -maccel=NNPA specified - Add CompilerConfigMap to store states associated with certain options - Move options in main() to CompilerOptions.cpp - Fix compiler warning in Stickify.cpp Signed-off-by: Gong Su <[email protected]> * - fix NNPA_ENABLED for lit test - install zdnn.h so we no longer need third_party/zdnn-lib - move options back to onnx-mlir.cpp::main (consolidation requires much more effort, deferred) - use DEPENDS in add_onnx_mlir_library for libzdnn dependency Signed-off-by: Gong Su <[email protected]> * Remove zdnn-lib from .gitmodules Signed-off-by: Gong Su <[email protected]> * Make libzdnn ALL target so it gets built before other NNPA components Signed-off-by: Gong Su <[email protected]> * Fix libzdnn dependency for NNPA components Signed-off-by: Gong Su <[email protected]> * Build NNPA in dev image as well Signed-off-by: Gong Su <[email protected]> * Comment out BYPRODUCTS in target libzdnn since generator support requires cmake 3.20+ which is not yet available on official Ubuntu Focal Signed-off-by: Gong Su <[email protected]> * - Force setting cached MLIR_DIR to honor command line argument - unset cached LLVM_DIR so it changes along with MLIR_DIR - surround third_party with set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and set(CMAKE_MESSAGE_LOG_LEVEL STATUS) to mask out their cmake output so we can see more clearly output by onnx-mlir only. third_party cmake output can still be turned on by the --log-level command line option Signed-off-by: Gong Su <[email protected]> * revert set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and set(CMAKE_MESSAGE_LOG_LEVEL STATUS) around third_party to make another PR Signed-off-by: Gong Su <[email protected]> * revert force setting cached MLIR_DIR and unsetting cached LLVM_DIR to make another PR Signed-off-by: Gong Su <[email protected]> Co-authored-by: Charles Volzka <[email protected]> Co-authored-by: Tung D. Le <[email protected]>

* Use shapeHelperInferShapes template to reduce boilerplate code. (llvm#1340) Leverage the template function 'shapeHelperInferShapes' to reduce code "duplication" in ShapeInference.cpp for several ONNX operators. As an example, the following implementation of the Transpose operator inferShapes member function: ONNXTransposeOp::inferShapes( std::function<void(mlir::Region &)> doShapeInference) { // Cannot infer shape if no shape exists. if (!data().getType().isa<RankedTensorType>()) return success(); auto elementType = data().getType().cast<ShapedType>().getElementType(); ONNXTransposeOpAdaptor operandAdaptor(*this); ONNXTransposeOpShapeHelper shapeHelper(this); if (failed(shapeHelper.computeShape(operandAdaptor))) return emitError("Failed to scan Transpose parameters successfully"); SmallVector<int64_t, 4> outputDims; IndexExpr::getShape(shapeHelper.dimsForOutput(), outputDims); getResult().setType(RankedTensorType::get(outputDims, elementType)); return success(); } Becomes: ONNXTransposeOp::inferShapes( std::function<void(mlir::Region &)> doShapeInference) { return shapeHelperInferShapes<ONNXTransposeOpShapeHelper, ONNXTransposeOp, ONNXTransposeOpAdaptor>(this, elementType); } Signed-off-by: Ettore Tiotto [email protected] Signed-off-by: Philip Lassen <[email protected]> * Allow lowering ONNXGemm with dynamic dims to ZHigh and fix zDNN Conv condition (llvm#1332) * Allow lowering ONNXGemm with dynamic dims to ZHigh Signed-off-by: Tung D. Le <[email protected]> * Update lit tests Signed-off-by: Tung D. Le <[email protected]> * Fix zDNN Conv condition Signed-off-by: Tung D. Le <[email protected]> Signed-off-by: Philip Lassen <[email protected]> * Fix builders with boolean output types Signed-off-by: Philip Lassen <[email protected]> * Fix format issue Signed-off-by: Philip Lassen <[email protected]> * Fix legality check of ONNXToZHigh for MaxPool. (llvm#1343) * Fix legality check of NNPA for 1d maxpool Signed-off-by: Haruki Imai <[email protected]> * Apply the same fix to conv Signed-off-by: Haruki Imai <[email protected]> * Add lit test for 1d maxpool and averagepool Signed-off-by: Haruki Imai <[email protected]> * Insert diation check after checking shape Signed-off-by: Haruki Imai <[email protected]> * Simplify lit test for pooling and update func name Signed-off-by: Haruki Imai <[email protected]> * Change func name to test_pool_not_lowered_pool1d and test_pool_not_lowered_pool3d Signed-off-by: Haruki Imai <[email protected]> Signed-off-by: Philip Lassen <[email protected]> * embed libzdnn in model.so (llvm#1324) * - Build libzdnn.a with -fPIC and embed in model.so when -maccel=NNPA specified - Add CompilerConfigMap to store states associated with certain options - Move options in main() to CompilerOptions.cpp - Fix compiler warning in Stickify.cpp Signed-off-by: Gong Su <[email protected]> * - fix NNPA_ENABLED for lit test - install zdnn.h so we no longer need third_party/zdnn-lib - move options back to onnx-mlir.cpp::main (consolidation requires much more effort, deferred) - use DEPENDS in add_onnx_mlir_library for libzdnn dependency Signed-off-by: Gong Su <[email protected]> * Remove zdnn-lib from .gitmodules Signed-off-by: Gong Su <[email protected]> * Make libzdnn ALL target so it gets built before other NNPA components Signed-off-by: Gong Su <[email protected]> * Fix libzdnn dependency for NNPA components Signed-off-by: Gong Su <[email protected]> * Build NNPA in dev image as well Signed-off-by: Gong Su <[email protected]> * Comment out BYPRODUCTS in target libzdnn since generator support requires cmake 3.20+ which is not yet available on official Ubuntu Focal Signed-off-by: Gong Su <[email protected]> * - Force setting cached MLIR_DIR to honor command line argument - unset cached LLVM_DIR so it changes along with MLIR_DIR - surround third_party with set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and set(CMAKE_MESSAGE_LOG_LEVEL STATUS) to mask out their cmake output so we can see more clearly output by onnx-mlir only. third_party cmake output can still be turned on by the --log-level command line option Signed-off-by: Gong Su <[email protected]> * revert set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and set(CMAKE_MESSAGE_LOG_LEVEL STATUS) around third_party to make another PR Signed-off-by: Gong Su <[email protected]> * revert force setting cached MLIR_DIR and unsetting cached LLVM_DIR to make another PR Signed-off-by: Gong Su <[email protected]> Co-authored-by: Charles Volzka <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Signed-off-by: Philip Lassen <[email protected]> * ScatterElements operator code gen. (llvm#1352) Implement support for the ONNX ScatterElement operator: - verification (verify diagnostic completeness) - shape inference (should be trivial, but verify) - initial codegen support - codegen for negative indices - add lit test to check code generation - enable end-to-end tests (backend tests) Signed-off-by: Ettore Tiotto [email protected] Signed-off-by: Philip Lassen <[email protected]> * Improve variable naming of builder lists Signed-off-by: Philip Lassen <[email protected]> Co-authored-by: Ettore Tiotto <[email protected]> Co-authored-by: Tung D. Le <[email protected]> Co-authored-by: Haruki Imai <[email protected]> Co-authored-by: gongsu832 <[email protected]> Co-authored-by: Charles Volzka <[email protected]>

silvasean · 2022-10-10T12:42:34Z

Closing this because the reduced test case passes now:

class RollModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x):
        return x.roll([2, -1], [0, 2])
x = torch.rand(3, 1, 2)
module = torch_mlir.compile(RollModule(), x, output_type=torch_mlir.OutputType.TORCH)

ramiro050 referenced this issue Sep 1, 2022

Add aten.hardtanh e2e support.

1d285f0

silvasean closed this as completed Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Vremold commented Aug 31, 2022 •

edited

Loading

ramiro050 commented Aug 31, 2022 •

edited

Loading

silvasean commented Aug 31, 2022 •

edited

Loading

Vremold commented Sep 1, 2022 •

edited

Loading

silvasean commented Oct 10, 2022

Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Summary of problems during iteratively executing TorchSimplificationPipeline #1324

Comments

Vremold commented Aug 31, 2022 • edited Loading

ramiro050 commented Aug 31, 2022 • edited Loading

silvasean commented Aug 31, 2022 • edited Loading

Vremold commented Sep 1, 2022 • edited Loading

silvasean commented Oct 10, 2022

Vremold commented Aug 31, 2022 •

edited

Loading

ramiro050 commented Aug 31, 2022 •

edited

Loading

silvasean commented Aug 31, 2022 •

edited

Loading

Vremold commented Sep 1, 2022 •

edited

Loading