IPAD, iteratively pruning and distillation to shrink model size.
- [2024/05] We relase our code for IPAD.
- LLAMA
- GLM
- OPT
- Clone this repository and navigate to PainlessInferenceAcceleration
git clone https://github.com/alipay/PainlessInferenceAcceleration.git
cd PainlessInferenceAcceleration/ipad
- Install Package
python setup.py install
Examples can be found in examples
.
@inproceedings{10.1145/3589335.3648321, author = {Wang, Maolin and Zhao, Yao and Liu, Jiajia and Chen, Jingdong and Zhuang, Chenyi and Gu, Jinjie and Guo, Ruocheng and Zhao, Xiangyu}, title = {Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation}, year = {2024}, isbn = {9798400701726}, publisher = {Association for Computing Machinery}, doi = {10.1145/3589335.3648321}, booktitle = {Companion Proceedings of the ACM Web Conference 2024}, pages = {235–244}, series = {WWW '24} }