Skip to content

Latest commit

 

History

History
59 lines (35 loc) · 1.08 KB

README.md

File metadata and controls

59 lines (35 loc) · 1.08 KB

IPAD

IPAD, iteratively pruning and distillation to shrink model size.

News or Update 🔥

  • [2024/05] We relase our code for IPAD.

Models we support

  • LLAMA
  • GLM
  • OPT

Introduction

Installation

  1. Clone this repository and navigate to PainlessInferenceAcceleration
git clone https://github.com/alipay/PainlessInferenceAcceleration.git
cd PainlessInferenceAcceleration/ipad
  1. Install Package
python setup.py install

Quick Start

Examples can be found in examples.

Citations

@inproceedings{10.1145/3589335.3648321, author = {Wang, Maolin and Zhao, Yao and Liu, Jiajia and Chen, Jingdong and Zhuang, Chenyi and Gu, Jinjie and Guo, Ruocheng and Zhao, Xiangyu}, title = {Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation}, year = {2024}, isbn = {9798400701726}, publisher = {Association for Computing Machinery}, doi = {10.1145/3589335.3648321}, booktitle = {Companion Proceedings of the ACM Web Conference 2024}, pages = {235–244}, series = {WWW '24} }