Skip to content

Commit 35ce9c0

Browse files
Fix bugs (modelscope#1241)
1 parent cce6bd7 commit 35ce9c0

File tree

3 files changed

+6
-3
lines changed

3 files changed

+6
-3
lines changed

docs/source/LLM/DPO训练文档.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ cd examples/pytorch/llm
8080
- 如果用带有history的数据训练base模型,需要指定支持多轮对话的template(base模型往往不支持多轮对话),对于这种情况我们默认设置了`chatml`template,你也可以支持--model_type 来选择训练模型的template
8181
- 我们默认在训练时设置`--gradient_checkpointing true`**节约显存**, 这会略微降低训练速度.
8282
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
83-
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
83+
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
8484
- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](命令行参数.md).
8585
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
8686

docs/source_en/LLM/DPO.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ cd examples/pytorch/llm
7878

7979
- We default to setting `--gradient_checkpointing true` during training to **save memory**, which will slightly reduce training speed.
8080
- If you are using older GPUs such as **V100**, you need to set `--dtype AUTO` or `--dtype fp16`, because they do not support bf16.
81-
- If your machine has high-performance graphics cards like A100 and you are using the qwen series models, we recommend installing [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (A10, 3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](Supported-models-datasets.md#models)
81+
- If your machine has high-performance graphics cards like A100 and you are using the qwen series models, we recommend installing [**flash-attn**](https://github.com/Dao-AILab/flash-attention), which will speed up training and inference as well as reduce memory usage (3090, V100, etc. graphics cards do not support training with flash-attn). Models that support flash-attn can be viewed in [LLM Supported Models](Supported-models-datasets.md#models)
8282
- If you need to train offline, please use `--model_id_or_path <model_dir>` and set `--check_model_is_latest false`. For specific parameter meanings, please see [Command Line Arguments](Command-line-parameters.md).
8383
- If you want to push weights to the ModelScope Hub during training, you need to set `--push_to_hub true`.
8484

swift/llm/utils/utils.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,10 @@ def _try_fetch(self, first_idx: int) -> Optional[Dict[str, Any]]:
269269
idx = np.random.permutation(len(self))[:self.try_fetch_time - 1]
270270
for i in [first_idx] + idx.tolist():
271271
data = self.dataset[i]
272-
res = self.template.encode(data)
272+
try:
273+
res = self.template.encode(data)
274+
except OSError:
275+
continue
273276
if len(res[0]) > 0:
274277
return res
275278

0 commit comments

Comments
 (0)