Skip to content

Support gemma2 #1247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!

## 🎉 News
- 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: Supports **DeepSeek-Coder-v2** series model! Use model_type `deepseek-coder-v2-instruct` and `deepseek-coder-v2-lite-instruct` to begin.
- 🔥2024.06.16: Supports **KTO** and **CPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Human-Preference-Alignment-Training-Documentation.md) to start training!
- 2024.06.11: Support for tool-calling agent deployment that conform to the OpenAI interface.You can refer to [Agent deployment best practice](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Agent-deployment-best-practice.md)
Expand Down Expand Up @@ -512,7 +513,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-236B | base model<br>chat model<br>MoE model<br>code model<br>math model |
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model<br>instruct model |
| Gemma<br>Gemma2 | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-27B | base model<br>instruct model |
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model<br>MoE model |
| OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese<br>English | 7B-70B | base model<br>chat model |
| Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese<br>English | 14B | base model<br>chat model |
Expand Down
3 changes: 2 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。

## 🎉 新闻
- 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct.
- 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct`和`deepseek-coder-v2-lite-instruct`来开启训练和推理.
- 🔥2024.06.16: 支持**KTO**和**CPO**训练,使用`swift rlhf --rlhf_type kto`和`swift rlhf --rlhf_type cpo`来开始训练,可以参考[文档](./docs/source/LLM/人类偏好对齐训练文档.md).
- 2024.06.11: 支持符合OpenAI接口的工具调用Agent部署, 可以查看[Agent部署最佳实践](docs/source/LLM/Agent部署最佳实践.md).
Expand Down Expand Up @@ -508,7 +509,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| InternLM<br>InternLM2<br>InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math<br>DeepSeek-V2<br>DeepSeek-Coder-V2 | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-236B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
| MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 |
| Gemma | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-7B | base模型<br>instruct模型 |
| Gemma<br>Gemma2 | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-27B | base模型<br>instruct模型 |
| MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B | chat模型<br>MoE模型 |
| OpenBuddy | [OpenBuddy系列模型](https://github.com/OpenBuddy/OpenBuddy) | 中文<br>英文 | 7B-70B | base模型<br>chat模型 |
| Orion | [猎户星空系列模型](https://github.com/OrionStarAI) | 中文<br>英文 | 14B | base模型<br>chat模型 |
Expand Down
5 changes: 5 additions & 0 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,10 @@
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)|
|gemma2-9b|[LLM-Research/gemma-2-9b](https://modelscope.cn/models/LLM-Research/gemma-2-9b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)|
|gemma2-27b|[LLM-Research/gemma-2-27b](https://modelscope.cn/models/LLM-Research/gemma-2-27b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)|
|gemma2-9b-instruct|[LLM-Research/gemma-2-9b-it](https://modelscope.cn/models/LLM-Research/gemma-2-9b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)|
|gemma2-27b-instruct|[LLM-Research/gemma-2-27b-it](https://modelscope.cn/models/LLM-Research/gemma-2-27b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)|
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
Expand Down Expand Up @@ -416,6 +420,7 @@
|webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)|
|generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)|
|🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)|
|🔥swift-mix|[swift/swift-sft-mixture](https://modelscope.cn/datasets/swift/swift-sft-mixture/summary)|sharegpt<br>firefly<br>codefuse<br>metamathqa|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, sft, general|-|
|cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-|
|ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-|
|coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-|
Expand Down
5 changes: 5 additions & 0 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,10 @@ The table below introcudes all models supported by SWIFT:
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)|
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)|
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)|
|gemma2-9b|[LLM-Research/gemma-2-9b](https://modelscope.cn/models/LLM-Research/gemma-2-9b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)|
|gemma2-27b|[LLM-Research/gemma-2-27b](https://modelscope.cn/models/LLM-Research/gemma-2-27b/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)|
|gemma2-9b-instruct|[LLM-Research/gemma-2-9b-it](https://modelscope.cn/models/LLM-Research/gemma-2-9b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)|
|gemma2-27b-instruct|[LLM-Research/gemma-2-27b-it](https://modelscope.cn/models/LLM-Research/gemma-2-27b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.42|-|[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)|
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
Expand Down Expand Up @@ -416,6 +420,7 @@ The table below introduces the datasets supported by SWIFT:
|webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)|
|generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)|
|🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)|
|🔥swift-mix|[swift/swift-sft-mixture](https://modelscope.cn/datasets/swift/swift-sft-mixture/summary)|sharegpt<br>firefly<br>codefuse<br>metamathqa|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, sft, general|-|
|cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-|
|ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-|
|coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-|
Expand Down
6 changes: 6 additions & 0 deletions swift/llm/data/dataset_info.json
Original file line number Diff line number Diff line change
Expand Up @@ -726,5 +726,11 @@
"hf_dataset_id": "modelscope/self-cognition",
"remove_useless_columns": false,
"tags": ["chat", "self-cognition", "🔥"]
},
"swift-mix": {
"dataset_id": "swift/swift-sft-mixture",
"subsets": ["sharegpt", "firefly", "codefuse", "metamathqa"],
"tags": ["chat", "sft", "general", "🔥"],
"huge_dataset": true
}
}
1 change: 1 addition & 0 deletions swift/llm/utils/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ class DatasetName:
webnovel_zh = 'webnovel-zh'
generated_chat_zh = 'generated-chat-zh'
self_cognition = 'self-cognition'
swift_mix = 'swift-mix'

# example dataset for specific model
cls_fudan_news_zh = 'cls-fudan-news-zh' # seqgpt-560m
Expand Down
40 changes: 40 additions & 0 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,6 +291,10 @@ class ModelType:
gemma_7b = 'gemma-7b'
gemma_2b_instruct = 'gemma-2b-instruct'
gemma_7b_instruct = 'gemma-7b-instruct'
gemma2_9b = 'gemma2-9b'
gemma2_27b = 'gemma2-27b'
gemma2_9b_instruct = 'gemma2-9b-instruct'
gemma2_27b_instruct = 'gemma2-27b-instruct'
# paligemma
paligemma_3b_pt_224 = 'paligemma-3b-pt-224'
paligemma_3b_pt_448 = 'paligemma-3b-pt-448'
Expand Down Expand Up @@ -1532,6 +1536,42 @@ def _output_device_map_hook(module, input, output):
return model, tokenizer


@register_model(
ModelType.gemma2_9b,
'LLM-Research/gemma-2-9b',
LoRATM.llama,
TemplateType.default_generation,
requires=['transformers>=4.42'],
support_flash_attn=True,
support_vllm=True,
hf_model_id='google/gemma-2-9b')
@register_model(
ModelType.gemma2_27b,
'LLM-Research/gemma-2-27b',
LoRATM.llama,
TemplateType.default_generation,
requires=['transformers>=4.42'],
support_flash_attn=True,
support_vllm=True,
hf_model_id='google/gemma-2-27b')
@register_model(
ModelType.gemma2_9b_instruct,
'LLM-Research/gemma-2-9b-it',
LoRATM.llama,
TemplateType.gemma,
requires=['transformers>=4.42'],
support_flash_attn=True,
support_vllm=True,
hf_model_id='google/gemma-2-9b-it')
@register_model(
ModelType.gemma2_27b_instruct,
'LLM-Research/gemma-2-27b-it',
LoRATM.llama,
TemplateType.gemma,
requires=['transformers>=4.42'],
support_flash_attn=True,
support_vllm=True,
hf_model_id='google/gemma-2-27b-it')
@register_model(
ModelType.qwen2_57b_a14b,
'qwen/Qwen2-57B-A14B',
Expand Down
Loading