diff --git a/README.md b/README.md index b746ce5be9..58932b957e 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/ SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try! ## 🎉 News +- 🔥2024.06.28: Support for Gemma2 series models: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct. - 🔥2024.06.18: Supports **DeepSeek-Coder-v2** series model! Use model_type `deepseek-coder-v2-instruct` and `deepseek-coder-v2-lite-instruct` to begin. - 🔥2024.06.16: Supports **KTO** and **CPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Human-Preference-Alignment-Training-Documentation.md) to start training! - 2024.06.11: Support for tool-calling agent deployment that conform to the OpenAI interface.You can refer to [Agent deployment best practice](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/Agent-deployment-best-practice.md) @@ -512,7 +513,7 @@ The complete list of supported models and datasets can be found at [Supported Mo | InternLM
InternLM2
InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese
English | 1.8B-20B | base model
chat model
math model | | DeepSeek
DeepSeek-MoE
DeepSeek-Coder
DeepSeek-Math
DeepSeek-V2
DeepSeek-Coder-V2 | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese
English | 1.3B-236B | base model
chat model
MoE model
code model
math model | | MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model | -| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model
instruct model | +| Gemma
Gemma2 | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-27B | base model
instruct model | | MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese
English | 2B-3B | chat model
MoE model | | OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese
English | 7B-70B | base model
chat model | | Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese
English | 14B | base model
chat model | diff --git a/README_CN.md b/README_CN.md index fb0f48cc54..6091f38864 100644 --- a/README_CN.md +++ b/README_CN.md @@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https: 可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。 ## 🎉 新闻 +- 🔥2024.06.28: 支持**Gemma2**系列模型: gemma2-9b, gemma2-9b-instruct, gemma2-27b, gemma2-27b-instruct. - 🔥2024.06.18: 支持**DeepSeek-Coder-v2**系列模型! 使用model_type`deepseek-coder-v2-instruct`和`deepseek-coder-v2-lite-instruct`来开启训练和推理. - 🔥2024.06.16: 支持**KTO**和**CPO**训练,使用`swift rlhf --rlhf_type kto`和`swift rlhf --rlhf_type cpo`来开始训练,可以参考[文档](./docs/source/LLM/人类偏好对齐训练文档.md). - 2024.06.11: 支持符合OpenAI接口的工具调用Agent部署, 可以查看[Agent部署最佳实践](docs/source/LLM/Agent部署最佳实践.md). @@ -508,7 +509,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \ | InternLM
InternLM2
InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文
英文 | 1.8B-20B | base模型
chat模型
数学模型 | | DeepSeek
DeepSeek-MoE
DeepSeek-Coder
DeepSeek-Math
DeepSeek-V2
DeepSeek-Coder-V2 | [幻方系列模型](https://github.com/deepseek-ai) | 中文
英文 | 1.3B-236B | base模型
chat模型
MoE模型
代码模型
数学模型 | | MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 | -| Gemma | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-7B | base模型
instruct模型 | +| Gemma
Gemma2 | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-27B | base模型
instruct模型 | | MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文
英文 | 2B-3B | chat模型
MoE模型 | | OpenBuddy | [OpenBuddy系列模型](https://github.com/OpenBuddy/OpenBuddy) | 中文
英文 | 7B-70B | base模型
chat模型 | | Orion | [猎户星空系列模型](https://github.com/OrionStarAI) | 中文
英文 | 14B | base模型
chat模型 | diff --git "a/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" "b/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" index cc350cbfe2..c776a29229 100644 --- "a/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" +++ "b/docs/source/LLM/\346\224\257\346\214\201\347\232\204\346\250\241\345\236\213\345\222\214\346\225\260\346\215\256\351\233\206.md" @@ -216,6 +216,10 @@ |gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)| |gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)| |gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)| +|gemma2-9b|[LLM-Research/gemma-2-9b](https://modelscope.cn/models/LLM-Research/gemma-2-9b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.42|-|[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)| +|gemma2-27b|[LLM-Research/gemma-2-27b](https://modelscope.cn/models/LLM-Research/gemma-2-27b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.42|-|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)| +|gemma2-9b-instruct|[LLM-Research/gemma-2-9b-it](https://modelscope.cn/models/LLM-Research/gemma-2-9b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.42|-|[google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)| +|gemma2-27b-instruct|[LLM-Research/gemma-2-27b-it](https://modelscope.cn/models/LLM-Research/gemma-2-27b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.42|-|[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)| |minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)| |minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)| |minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)| @@ -416,6 +420,7 @@ |webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)| |generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)| |🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)| +|🔥swift-mix|[swift/swift-sft-mixture](https://modelscope.cn/datasets/swift/swift-sft-mixture/summary)|sharegpt
firefly
codefuse
metamathqa|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, sft, general|-| |cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-| |ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-| |coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-| diff --git a/docs/source_en/LLM/Supported-models-datasets.md b/docs/source_en/LLM/Supported-models-datasets.md index 2b7b4eae04..86637ed854 100644 --- a/docs/source_en/LLM/Supported-models-datasets.md +++ b/docs/source_en/LLM/Supported-models-datasets.md @@ -216,6 +216,10 @@ The table below introcudes all models supported by SWIFT: |gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.38|-|[google/gemma-7b](https://huggingface.co/google/gemma-7b)| |gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.38|-|[google/gemma-2b-it](https://huggingface.co/google/gemma-2b-it)| |gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.38|-|[google/gemma-7b-it](https://huggingface.co/google/gemma-7b-it)| +|gemma2-9b|[LLM-Research/gemma-2-9b](https://modelscope.cn/models/LLM-Research/gemma-2-9b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.42|-|[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)| +|gemma2-27b|[LLM-Research/gemma-2-27b](https://modelscope.cn/models/LLM-Research/gemma-2-27b/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.42|-|[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b)| +|gemma2-9b-instruct|[LLM-Research/gemma-2-9b-it](https://modelscope.cn/models/LLM-Research/gemma-2-9b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.42|-|[google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)| +|gemma2-27b-instruct|[LLM-Research/gemma-2-27b-it](https://modelscope.cn/models/LLM-Research/gemma-2-27b-it/summary)|q_proj, k_proj, v_proj|gemma|✔|✔|transformers>=4.42|-|[google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)| |minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔|transformers>=4.36.0|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)| |minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔||-|[openbmb/MiniCPM-2B-sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)| |minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|✔|✔||-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)| @@ -416,6 +420,7 @@ The table below introduces the datasets supported by SWIFT: |webnovel-zh|[AI-ModelScope/webnovel_cn](https://modelscope.cn/datasets/AI-ModelScope/webnovel_cn/summary)||50000|1478.9±11526.1, min=100, max=490484|chat, novel|[zxbsmk/webnovel_cn](https://huggingface.co/datasets/zxbsmk/webnovel_cn)| |generated-chat-zh|[AI-ModelScope/generated_chat_0.4M](https://modelscope.cn/datasets/AI-ModelScope/generated_chat_0.4M/summary)||396004|273.3±52.0, min=32, max=873|chat, character-dialogue|[BelleGroup/generated_chat_0.4M](https://huggingface.co/datasets/BelleGroup/generated_chat_0.4M)| |🔥self-cognition|[swift/self-cognition](https://modelscope.cn/datasets/swift/self-cognition/summary)||134|53.6±18.6, min=29, max=121|chat, self-cognition|[modelscope/self-cognition](https://huggingface.co/datasets/modelscope/self-cognition)| +|🔥swift-mix|[swift/swift-sft-mixture](https://modelscope.cn/datasets/swift/swift-sft-mixture/summary)|sharegpt
firefly
codefuse
metamathqa|-|Dataset is too huge, please click the original link to view the dataset stat.|chat, sft, general|-| |cls-fudan-news-zh|[damo/zh_cls_fudan-news](https://modelscope.cn/datasets/damo/zh_cls_fudan-news/summary)||4959|3234.4±2547.5, min=91, max=19548|chat, classification|-| |ner-jave-zh|[damo/zh_ner-JAVE](https://modelscope.cn/datasets/damo/zh_ner-JAVE/summary)||1266|118.3±45.5, min=44, max=223|chat, ner|-| |coco-en|[modelscope/coco_2014_caption](https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary)|coco_2014_caption|454617|299.8±2.8, min=295, max=352|chat, multi-modal, vision|-| diff --git a/swift/llm/data/dataset_info.json b/swift/llm/data/dataset_info.json index 702bf7f938..df33f3f521 100644 --- a/swift/llm/data/dataset_info.json +++ b/swift/llm/data/dataset_info.json @@ -726,5 +726,11 @@ "hf_dataset_id": "modelscope/self-cognition", "remove_useless_columns": false, "tags": ["chat", "self-cognition", "🔥"] + }, + "swift-mix": { + "dataset_id": "swift/swift-sft-mixture", + "subsets": ["sharegpt", "firefly", "codefuse", "metamathqa"], + "tags": ["chat", "sft", "general", "🔥"], + "huge_dataset": true } } diff --git a/swift/llm/utils/dataset.py b/swift/llm/utils/dataset.py index ba8e79396d..4637b11055 100644 --- a/swift/llm/utils/dataset.py +++ b/swift/llm/utils/dataset.py @@ -129,6 +129,7 @@ class DatasetName: webnovel_zh = 'webnovel-zh' generated_chat_zh = 'generated-chat-zh' self_cognition = 'self-cognition' + swift_mix = 'swift-mix' # example dataset for specific model cls_fudan_news_zh = 'cls-fudan-news-zh' # seqgpt-560m diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py index 8dfe89b56d..e02b65aef3 100644 --- a/swift/llm/utils/model.py +++ b/swift/llm/utils/model.py @@ -291,6 +291,10 @@ class ModelType: gemma_7b = 'gemma-7b' gemma_2b_instruct = 'gemma-2b-instruct' gemma_7b_instruct = 'gemma-7b-instruct' + gemma2_9b = 'gemma2-9b' + gemma2_27b = 'gemma2-27b' + gemma2_9b_instruct = 'gemma2-9b-instruct' + gemma2_27b_instruct = 'gemma2-27b-instruct' # paligemma paligemma_3b_pt_224 = 'paligemma-3b-pt-224' paligemma_3b_pt_448 = 'paligemma-3b-pt-448' @@ -1532,6 +1536,42 @@ def _output_device_map_hook(module, input, output): return model, tokenizer +@register_model( + ModelType.gemma2_9b, + 'LLM-Research/gemma-2-9b', + LoRATM.llama, + TemplateType.default_generation, + requires=['transformers>=4.42'], + support_flash_attn=True, + support_vllm=True, + hf_model_id='google/gemma-2-9b') +@register_model( + ModelType.gemma2_27b, + 'LLM-Research/gemma-2-27b', + LoRATM.llama, + TemplateType.default_generation, + requires=['transformers>=4.42'], + support_flash_attn=True, + support_vllm=True, + hf_model_id='google/gemma-2-27b') +@register_model( + ModelType.gemma2_9b_instruct, + 'LLM-Research/gemma-2-9b-it', + LoRATM.llama, + TemplateType.gemma, + requires=['transformers>=4.42'], + support_flash_attn=True, + support_vllm=True, + hf_model_id='google/gemma-2-9b-it') +@register_model( + ModelType.gemma2_27b_instruct, + 'LLM-Research/gemma-2-27b-it', + LoRATM.llama, + TemplateType.gemma, + requires=['transformers>=4.42'], + support_flash_attn=True, + support_vllm=True, + hf_model_id='google/gemma-2-27b-it') @register_model( ModelType.qwen2_57b_a14b, 'qwen/Qwen2-57B-A14B',