Skip to content

[TRACKER] Bloom-PyTorch Model #1340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AmosLewis opened this issue Sep 2, 2022 · 3 comments
Closed

[TRACKER] Bloom-PyTorch Model #1340

AmosLewis opened this issue Sep 2, 2022 · 3 comments
Assignees
Labels
model support Hub issue for progress on adding support for a specific model

Comments

@AmosLewis
Copy link
Collaborator

AmosLewis commented Sep 2, 2022

Hi

I am working on providing the support for bloom-pytorch model via torch-mlir. As of now, I used 2 unmerged patch to solver the ops that not supported. Those 2 patch pass the check independently, but when I try to merge them together. It results in the following segmentation error:

Current thread 0x00007f8b57683740 (most recent call first):
  File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/compiler_utils.py", line 47 in run_pipeline_with_repro_report
  File "/home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/__init__.py", line 247 in compile
  File "/home/chi/src/ubuntu20/shark/SHARK/shark/torch_mlir_utils.py", line 69 in get_torch_mlir_module
  File "/home/chi/src/ubuntu20/shark/SHARK/shark/shark_importer.py", line 74 in _torch_mlir
  File "/home/chi/src/ubuntu20/shark/SHARK/shark/shark_importer.py", line 109 in import_mlir
  File "/home/chi/src/ubuntu20/shark/SHARK/shark/shark_importer.py", line 163 in import_debug
  File "/home/chi/src/ubuntu20/shark/SHARK/generate_sharktank.py", line 85 in save_torch_model
  File "/home/chi/src/ubuntu20/shark/SHARK/generate_sharktank.py", line 233 in <module>
[3]    181801 segmentation fault (core dumped)  python3.10 generate_sharktank.py

Here is the patch to run:
https://github.com/AmosLewis/SHARK/tree/bloom
To test it,

 cd SHARK
 python generate_sharktank.py 

The torch_mlir used is https://github.com/AmosLewis/torch-mlir/tree/bloom3.
It is the combination of george-cumsum-op-support and prashant-max-other-op-support.

I tried to run the python test
cmake --build build --target check-torch-mlir-python

The george-cumsum-op-support and prashant-max-other-op-support op support patches pass the check independently. But when combining them together, it leads to the following error:

[53/54] Running the torch-mlir Python regression tests
FAIL: TORCH_MLIR_PYTHON :: compile_api/basic.py (1 of 13)
******************** TEST 'TORCH_MLIR_PYTHON :: compile_api/basic.py' FAILED ********************
Script:
--
: 'RUN: at line 6';   /home/chi/src/ubuntu20/shark/torch-mlir/mlir_venv/bin/python3.10 /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/basic.py | /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/basic.py
--
Exit Code: 2

Command Output (stderr):
--
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/basic.py

--

********************
FAIL: TORCH_MLIR_PYTHON :: compile_api/output_type_spec.py (3 of 13)
******************** TEST 'TORCH_MLIR_PYTHON :: compile_api/output_type_spec.py' FAILED ********************
Script:
--
: 'RUN: at line 6';   /home/chi/src/ubuntu20/shark/torch-mlir/mlir_venv/bin/python3.10 /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/output_type_spec.py | /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/output_type_spec.py
--
Exit Code: 2

Command Output (stderr):
--
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/output_type_spec.py

--

********************
FAIL: TORCH_MLIR_PYTHON :: compile_api/tracing.py (4 of 13)
******************** TEST 'TORCH_MLIR_PYTHON :: compile_api/tracing.py' FAILED ********************
Script:
--
: 'RUN: at line 6';   /home/chi/src/ubuntu20/shark/torch-mlir/mlir_venv/bin/python3.10 /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/tracing.py | /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/tracing.py
--
Exit Code: 2

Command Output (stderr):
--
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/chi/src/ubuntu20/shark/torch-mlir/build/bin/FileCheck /home/chi/src/ubuntu20/shark/torch-mlir/python/test/compile_api/tracing.py

--

********************
********************
Failed Tests (3):
  TORCH_MLIR_PYTHON :: compile_api/basic.py
  TORCH_MLIR_PYTHON :: compile_api/output_type_spec.py
  TORCH_MLIR_PYTHON :: compile_api/tracing.py


Testing Time: 2.01s
  Passed: 10
  Failed:  3
FAILED: tools/torch-mlir/python/test/CMakeFiles/check-torch-mlir-python /home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python/test/CMakeFiles/check-torch-mlir-python 
cd /home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python/test && /home/chi/src/ubuntu20/shark/torch-mlir/mlir_venv/bin/python3.10 /home/chi/src/ubuntu20/shark/torch-mlir/build/./bin/llvm-lit -sv /home/chi/src/ubuntu20/shark/torch-mlir/build/tools/torch-mlir/python/test
ninja: build stopped: subcommand failed.

@powderluv
Copy link
Collaborator

this seems like a local build issue.

@AmosLewis
Copy link
Collaborator Author

this seems like a local build issue.

it's might because george-cumsum-op-support and prashant-max-other-op-support both did change on cumsum op, might lead to conflict

@vivekkhandelwal1 vivekkhandelwal1 added the model support Hub issue for progress on adding support for a specific model label Sep 5, 2022
@AmosLewis
Copy link
Collaborator Author

AmosLewis commented Sep 8, 2022

#1348 aten::_reshape_alias op need to be fixed first.

qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
…#1340)

Leverage the template function 'shapeHelperInferShapes' to reduce code "duplication" in ShapeInference.cpp for several ONNX operators. As an example, the following implementation of the Transpose operator inferShapes member function:

ONNXTransposeOp::inferShapes(
    std::function<void(mlir::Region &)> doShapeInference) {
  // Cannot infer shape if no shape exists.
  if (!data().getType().isa<RankedTensorType>())
    return success();

  auto elementType = data().getType().cast<ShapedType>().getElementType();
  ONNXTransposeOpAdaptor operandAdaptor(*this);
  ONNXTransposeOpShapeHelper shapeHelper(this);
  if (failed(shapeHelper.computeShape(operandAdaptor)))
    return emitError("Failed to scan Transpose parameters successfully");
  SmallVector<int64_t, 4> outputDims;
  IndexExpr::getShape(shapeHelper.dimsForOutput(), outputDims);
  getResult().setType(RankedTensorType::get(outputDims, elementType));
  
  return success();
}

Becomes:

ONNXTransposeOp::inferShapes(
    std::function<void(mlir::Region &)> doShapeInference) {
  return shapeHelperInferShapes<ONNXTransposeOpShapeHelper, ONNXTransposeOp,
      ONNXTransposeOpAdaptor>(this, elementType);
}

Signed-off-by: Ettore Tiotto [email protected]
qedawkins pushed a commit to nod-ai/torch-mlir that referenced this issue Oct 3, 2022
* Use shapeHelperInferShapes template to reduce boilerplate code. (llvm#1340)

Leverage the template function 'shapeHelperInferShapes' to reduce code "duplication" in ShapeInference.cpp for several ONNX operators. As an example, the following implementation of the Transpose operator inferShapes member function:

ONNXTransposeOp::inferShapes(
    std::function<void(mlir::Region &)> doShapeInference) {
  // Cannot infer shape if no shape exists.
  if (!data().getType().isa<RankedTensorType>())
    return success();

  auto elementType = data().getType().cast<ShapedType>().getElementType();
  ONNXTransposeOpAdaptor operandAdaptor(*this);
  ONNXTransposeOpShapeHelper shapeHelper(this);
  if (failed(shapeHelper.computeShape(operandAdaptor)))
    return emitError("Failed to scan Transpose parameters successfully");
  SmallVector<int64_t, 4> outputDims;
  IndexExpr::getShape(shapeHelper.dimsForOutput(), outputDims);
  getResult().setType(RankedTensorType::get(outputDims, elementType));
  
  return success();
}

Becomes:

ONNXTransposeOp::inferShapes(
    std::function<void(mlir::Region &)> doShapeInference) {
  return shapeHelperInferShapes<ONNXTransposeOpShapeHelper, ONNXTransposeOp,
      ONNXTransposeOpAdaptor>(this, elementType);
}

Signed-off-by: Ettore Tiotto [email protected]
Signed-off-by: Philip Lassen <[email protected]>

* Allow lowering ONNXGemm with dynamic dims to ZHigh and fix zDNN Conv condition (llvm#1332)

* Allow lowering ONNXGemm with dynamic dims to ZHigh

Signed-off-by: Tung D. Le <[email protected]>

* Update lit tests

Signed-off-by: Tung D. Le <[email protected]>

* Fix zDNN Conv condition

Signed-off-by: Tung D. Le <[email protected]>
Signed-off-by: Philip Lassen <[email protected]>

* Fix builders with boolean output types

Signed-off-by: Philip Lassen <[email protected]>

* Fix format issue

Signed-off-by: Philip Lassen <[email protected]>

* Fix legality check of ONNXToZHigh for MaxPool. (llvm#1343)

* Fix legality check of NNPA for 1d maxpool

Signed-off-by: Haruki Imai <[email protected]>

* Apply the same fix to conv

Signed-off-by: Haruki Imai <[email protected]>

* Add lit test for 1d maxpool and averagepool

Signed-off-by: Haruki Imai <[email protected]>

* Insert diation check after checking shape

Signed-off-by: Haruki Imai <[email protected]>

* Simplify lit test for pooling and update func name

Signed-off-by: Haruki Imai <[email protected]>

* Change func name to test_pool_not_lowered_pool1d and test_pool_not_lowered_pool3d

Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Philip Lassen <[email protected]>

* embed libzdnn in model.so (llvm#1324)

* - Build libzdnn.a with -fPIC and embed in model.so when
  -maccel=NNPA specified
- Add CompilerConfigMap to store states associated with
  certain options
- Move options in main() to CompilerOptions.cpp
- Fix compiler warning in Stickify.cpp

Signed-off-by: Gong Su <[email protected]>

* - fix NNPA_ENABLED for lit test
- install zdnn.h so we no longer need third_party/zdnn-lib
- move options back to onnx-mlir.cpp::main (consolidation
  requires much more effort, deferred)
- use DEPENDS in add_onnx_mlir_library for libzdnn dependency

Signed-off-by: Gong Su <[email protected]>

* Remove zdnn-lib from .gitmodules

Signed-off-by: Gong Su <[email protected]>

* Make libzdnn ALL target so it gets built before other NNPA components

Signed-off-by: Gong Su <[email protected]>

* Fix libzdnn dependency for NNPA components

Signed-off-by: Gong Su <[email protected]>

* Build NNPA in dev image as well

Signed-off-by: Gong Su <[email protected]>

* Comment out BYPRODUCTS in target libzdnn since generator support
requires cmake 3.20+ which is not yet available on official
Ubuntu Focal

Signed-off-by: Gong Su <[email protected]>

* - Force setting cached MLIR_DIR to honor command line argument
- unset cached LLVM_DIR so it changes along with MLIR_DIR
- surround third_party with set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and
  set(CMAKE_MESSAGE_LOG_LEVEL STATUS) to mask out their cmake
  output so we can see more clearly output by onnx-mlir only.
  third_party cmake output can still be turned on by the --log-level
  command line option

Signed-off-by: Gong Su <[email protected]>

* revert set(CMAKE_MESSAGE_LOG_LEVEL NOTICE) and set(CMAKE_MESSAGE_LOG_LEVEL STATUS)
around third_party to make another PR

Signed-off-by: Gong Su <[email protected]>

* revert force setting cached MLIR_DIR and unsetting cached LLVM_DIR
to make another PR

Signed-off-by: Gong Su <[email protected]>

Co-authored-by: Charles Volzka <[email protected]>
Co-authored-by: Tung D. Le <[email protected]>
Signed-off-by: Philip Lassen <[email protected]>

* ScatterElements operator code gen. (llvm#1352)

Implement support for the ONNX ScatterElement operator:

 - verification (verify diagnostic completeness)
 - shape inference (should be trivial, but verify)
 - initial codegen support
 - codegen for negative indices
 - add lit test to check code generation
 - enable end-to-end tests (backend tests)

Signed-off-by: Ettore Tiotto [email protected]
Signed-off-by: Philip Lassen <[email protected]>

* Improve variable naming of builder lists

Signed-off-by: Philip Lassen <[email protected]>

Co-authored-by: Ettore Tiotto <[email protected]>
Co-authored-by: Tung D. Le <[email protected]>
Co-authored-by: Haruki Imai <[email protected]>
Co-authored-by: gongsu832 <[email protected]>
Co-authored-by: Charles Volzka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model support Hub issue for progress on adding support for a specific model
Projects
None yet
Development

No branches or pull requests

3 participants