update wd14 tagger and doc

kohya-ss · kohya-ss · commit cae5aa0a56c2 · 2024-03-30T21:48:22.000+09:00
diff --git a/README.md b/README.md
@@ -156,6 +156,14 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
 - The support for v3 repositories is added to `tag_image_by_wd14_tagger.py` (`--onnx` option only). PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) Thanks to sdbds!
   - Onnx may need to be updated. Onnx is not installed by default, so please install or update it with `pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` etc. Please also check the comments in `requirements.txt`.
 - The model is now saved in the subdirectory as `--repo_id` in `tag_image_by_wd14_tagger.py` . This caches multiple repo_id models. Please delete unnecessary files under `--model_dir`.
+- Some options are added to `tag_image_by_wd14_tagger.py`.
+  - Some are added in PR [#1216](https://github.com/kohya-ss/sd-scripts/pull/1216) Thanks to Disty0!
+  - Output rating tags `--use_rating_tags` and `--use_rating_tags_as_last_tag`
+  - Output character tags first `--character_tags_first`
+  - Expand character tags and series `--character_tag_expand`
+  - Specify tags to output first `--always_first_tags`
+  - Replace tags `--tag_replacement`
+  - See [Tagging documentation](./docs/wd14_tagger_README-en.md) for details.
 - Fixed an error when specifying `--beam_search` and a value of 2 or more for `--num_beams` in `make_captions.py`.
 - The options `--noise_offset_random_strength` and `--ip_noise_gamma_random_strength` are added to each training script. These options can be used to vary the noise offset and ip noise gamma in the range of 0 to the specified value. PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) Thanks to KohakuBlueleaf!
 - The options `--save_state_on_train_end` are added to each training script. PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) Thanks to gesen2egee!
@@ -181,6 +189,14 @@ The majority of scripts is licensed under ASL 2.0 (including codes from Diffuser
 - `tag_image_by_wd14_tagger.py` で v3 のリポジトリがサポートされました（`--onnx` 指定時のみ有効）。 PR [#1192](https://github.com/kohya-ss/sd-scripts/pull/1192) sdbds 氏に感謝します。
   - Onnx のバージョンアップが必要になるかもしれません。デフォルトでは Onnx はインストールされていませんので、`pip install onnx==1.15.0 onnxruntime-gpu==1.17.1` 等でインストール、アップデートしてください。`requirements.txt` のコメントもあわせてご確認ください。
 - `tag_image_by_wd14_tagger.py` で、モデルを`--repo_id` のサブディレクトリに保存するようにしました。これにより複数のモデルファイルがキャッシュされます。`--model_dir` 直下の不要なファイルは削除願います。
+- `tag_image_by_wd14_tagger.py` にいくつかのオプションを追加しました。
+  - 一部は PR [#1216](https://github.com/kohya-ss/sd-scripts/pull/1216) で追加されました。Disty0 氏に感謝します。
+  - レーティングタグを出力する `--use_rating_tags` および `--use_rating_tags_as_last_tag`
+  - キャラクタタグを最初に出力する `--character_tags_first`
+  - キャラクタタグとシリーズを展開する `--character_tag_expand`
+  - 常に最初に出力するタグを指定する `--always_first_tags`
+  - タグを置換する `--tag_replacement`
+  - 詳細は [タグ付けに関するドキュメント](./docs/wd14_tagger_README-ja.md) をご覧ください。
 - `make_captions.py` で `--beam_search` を指定し `--num_beams` に2以上の値を指定した時のエラーを修正しました。
 - 各学習スクリプトに、noise offset、ip noise gammaを、それぞれ 0~指定した値の範囲で変動させるオプション `--noise_offset_random_strength` および `--ip_noise_gamma_random_strength` が追加されました。 PR [#1177](https://github.com/kohya-ss/sd-scripts/pull/1177) KohakuBlueleaf 氏に感謝します。
 - 各学習スクリプトに、学習終了時に state を保存する `--save_state_on_train_end` オプションが追加されました。 PR [#1168](https://github.com/kohya-ss/sd-scripts/pull/1168) gesen2egee 氏に感謝します。
diff --git a/docs/wd14_tagger_README-en.md b/docs/wd14_tagger_README-en.md
@@ -0,0 +1,85 @@
+# Image Tagging using WD14Tagger
+
+This document is based on the information from this github page (https://github.com/toriato/stable-diffusion-webui-wd14-tagger#mrsmilingwolfs-model-aka-waifu-diffusion-14-tagger).
+
+Using onnx for inference is recommended. Please install onnx with the following command:
+
+```powershell
+pip install onnx==1.15.0 onnxruntime-gpu==1.17.1  
+```
+
+The model weights will be automatically downloaded from Hugging Face.
+
+# Usage
+
+Run the script to perform tagging.
+
+```powershell
+python finetune/tag_images_by_wd14_tagger.py --onnx --repo_id <model repo id> --batch_size <batch size> <training data folder>
+```
+
+For example, if using the repository `SmilingWolf/wd-swinv2-tagger-v3` with a batch size of 4, and the training data is located in the parent folder `train_data`, it would be:
+
+```powershell
+python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 --batch_size 4 ..\train_data
+```
+
+On the first run, the model files will be automatically downloaded to the `wd14_tagger_model` folder (the folder can be changed with an option). 
+
+Tag files will be created in the same directory as the training data images, with the same filename and a `.txt` extension.
+
+![Generated tag files](https://user-images.githubusercontent.com/52813779/208910534-ea514373-1185-4b7d-9ae3-61eb50bc294e.png)
+
+![Tags and image](https://user-images.githubusercontent.com/52813779/208910599-29070c15-7639-474f-b3e4-06bd5a3df29e.png)
+
+## Example
+
+To output in the Animagine XL 3.1 format, it would be as follows (enter on a single line in practice):
+
+```
+python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 
+    --batch_size 4  --remove_underscore --undesired_tags "PUT,YOUR,UNDESIRED,TAGS" --recursive 
+    --use_rating_tagss_as_last_tag --character_tags_first --character_tag_expand 
+    --always_first_tags "1girl,1boy"  ..\train_data
+```
+
+## Available Repository IDs
+
+[SmilingWolf's V2 and V3 models](https://huggingface.co/SmilingWolf) are available for use. Specify them in the format like `SmilingWolf/wd-vit-tagger-v3`. The default when omitted is `SmilingWolf/wd-v1-4-convnext-tagger-v2`.
+
+# Options 
+
+## General Options
+
+- `--onnx`: Use ONNX for inference. If not specified, TensorFlow will be used. If using TensorFlow, please install TensorFlow separately. 
+- `--batch_size`: Number of images to process at once. Default is 1. Adjust according to VRAM capacity.
+- `--caption_extension`: File extension for caption files. Default is `.txt`.
+- `--max_data_loader_n_workers`: Maximum number of workers for DataLoader. Specifying a value of 1 or more will use DataLoader to speed up image loading. If unspecified, DataLoader will not be used.
+- `--thresh`: Confidence threshold for outputting tags. Default is 0.35. Lowering the value will assign more tags but accuracy will decrease. 
+- `--general_threshold`: Confidence threshold for general tags. If omitted, same as `--thresh`.
+- `--character_threshold`: Confidence threshold for character tags. If omitted, same as `--thresh`.
+- `--recursive`: If specified, subfolders within the specified folder will also be processed recursively.
+- `--append_tags`: Append tags to existing tag files.
+- `--frequency_tags`: Output tag frequencies.  
+- `--debug`: Debug mode. Outputs debug information if specified.
+
+## Model Download
+
+- `--model_dir`: Folder to save model files. Default is `wd14_tagger_model`.  
+- `--force_download`: Re-download model files if specified.
+
+## Tag Editing
+
+- `--remove_underscore`: Remove underscores from output tags.
+- `--undesired_tags`: Specify tags not to output. Multiple tags can be specified, separated by commas. For example, `black eyes,black hair`.
+- `--use_rating_tags`: Output rating tags at the beginning of the tags.
+- `--use_rating_tags_as_last_tag`: Add rating tags at the end of the tags.
+- `--character_tags_first`: Output character tags first.
+- `--character_tag_expand`: Expand character tag series names. For example, split the tag `chara_name_(series)` into `chara_name, series`.  
+- `--always_first_tags`: Specify tags to always output first when a certain tag appears in an image. Multiple tags can be specified, separated by commas. For example, `1girl,1boy`.
+- `--caption_separator`: Separate tags with this string in the output file. Default is `, `.
+- `--tag_replacement`: Perform tag replacement. Specify in the format `tag1,tag2;tag3,tag4`. 
+
+When specifying `remove_underscore`, specify `undesired_tags`, `always_first_tags`, and `tag_replacement` without including underscores.
+
+When specifying `caption_separator`, separate `undesired_tags` and `always_first_tags` with `caption_separator`. Always separate `tag_replacement` with `,`.
diff --git a/docs/wd14_tagger_README-ja.md b/docs/wd14_tagger_README-ja.md
@@ -0,0 +1,85 @@
+# WD14Taggerによるタグ付け
+
+こちらのgithubページ（https://github.com/toriato/stable-diffusion-webui-wd14-tagger#mrsmilingwolfs-model-aka-waifu-diffusion-14-tagger ）の情報を参考にさせていただきました。
+
+onnx を用いた推論を推奨します。以下のコマンドで onnx をインストールしてください。
+
+```powershell
+pip install onnx==1.15.0 onnxruntime-gpu==1.17.1
+```
+
+モデルの重みはHugging Faceから自動的にダウンロードしてきます。
+
+# 使い方
+
+スクリプトを実行してタグ付けを行います。
+```
+python fintune/tag_images_by_wd14_tagger.py --onnx --repo_id <モデルのrepo id> --batch_size <バッチサイズ> <教師データフォルダ>
+```
+
+レポジトリに `SmilingWolf/wd-swinv2-tagger-v3` を使用し、バッチサイズを4にして、教師データを親フォルダの `train_data`に置いた場合、以下のようになります。
+
+```
+python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 --batch_size 4 ..\train_data
+```
+
+初回起動時にはモデルファイルが `wd14_tagger_model` フォルダに自動的にダウンロードされます（フォルダはオプションで変えられます）。
+
+タグファイルが教師データ画像と同じディレクトリに、同じファイル名、拡張子.txtで作成されます。
+
+![生成されたタグファイル](https://user-images.githubusercontent.com/52813779/208910534-ea514373-1185-4b7d-9ae3-61eb50bc294e.png)
+
+![タグと画像](https://user-images.githubusercontent.com/52813779/208910599-29070c15-7639-474f-b3e4-06bd5a3df29e.png)
+
+## 記述例
+
+Animagine XL 3.1 方式で出力する場合、以下のようになります（実際には 1 行で入力してください）。
+
+```
+python tag_images_by_wd14_tagger.py --onnx --repo_id SmilingWolf/wd-swinv2-tagger-v3 
+    --batch_size 4  --remove_underscore --undesired_tags "PUT,YOUR,UNDESIRED,TAGS" --recursive 
+    --use_rating_tagss_as_last_tag --character_tags_first --character_tag_expand 
+    --always_first_tags "1girl,1boy"  ..\train_data
+```
+
+## 使用可能なリポジトリID
+
+[SmilingWolf 氏の V2、V3 のモデル](https://huggingface.co/SmilingWolf)が使用可能です。`SmilingWolf/wd-vit-tagger-v3` のように指定してください。省略時のデフォルトは `SmilingWolf/wd-v1-4-convnext-tagger-v2` です。
+
+# オプション
+
+## 一般オプション
+
+- `--onnx` : ONNX を使用して推論します。指定しない場合は TensorFlow を使用します。TensorFlow 使用時は別途 TensorFlow をインストールしてください。
+- `--batch_size` : 一度に処理する画像の数。デフォルトは1です。VRAMの容量に応じて増減してください。
+- `--caption_extension` : キャプションファイルの拡張子。デフォルトは `.txt` です。
+- `--max_data_loader_n_workers` : DataLoader の最大ワーカー数です。このオプションに 1 以上の数値を指定すると、DataLoader を用いて画像読み込みを高速化します。未指定時は DataLoader を用いません。
+- `--thresh` : 出力するタグの信頼度の閾値。デフォルトは0.35です。値を下げるとより多くのタグが付与されますが、精度は下がります。
+- `--general_threshold` : 一般タグの信頼度の閾値。省略時は `--thresh` と同じです。
+- `--character_threshold` : キャラクタータグの信頼度の閾値。省略時は `--thresh` と同じです。
+- `--recursive` : 指定すると、指定したフォルダ内のサブフォルダも再帰的に処理します。
+- `--append_tags` : 既存のタグファイルにタグを追加します。
+- `--frequency_tags` : タグの頻度を出力します。
+- `--debug` : デバッグモード。指定するとデバッグ情報を出力します。
+
+## モデルのダウンロード
+
+- `--model_dir` : モデルファイルの保存先フォルダ。デフォルトは `wd14_tagger_model` です。
+- `--force_download` : 指定するとモデルファイルを再ダウンロードします。
+
+## タグ編集関連
+
+- `--remove_underscore` : 出力するタグからアンダースコアを削除します。
+- `--undesired_tags` : 出力しないタグを指定します。カンマ区切りで複数指定できます。たとえば `black eyes,black hair` のように指定します。
+- `--use_rating_tags` : タグの最初にレーティングタグを出力します。
+- `--use_rating_tags_as_last_tag` : タグの最後にレーティングタグを追加します。
+- `--character_tags_first` : キャラクタータグを最初に出力します。
+- `--character_tag_expand` : キャラクタータグのシリーズ名を展開します。たとえば `chara_name_(series)` のタグを `chara_name, series` に分割します。
+- `--always_first_tags` : あるタグが画像に出力されたとき、そのタグを最初に出力するタグを指定します。カンマ区切りで複数指定できます。たとえば `1girl,1boy` のように指定します。
+- `--caption_separator` : 出力するファイルでタグをこの文字列で区切ります。デフォルトは `, ` です。
+- `--tag_replacement` : タグの置換を行います。`tag1,tag2;tag3,tag4` のように指定します。
+
+`remove_underscore` 指定時は、`undesired_tags`、`always_first_tags`、`tag_replacement` はアンダースコアを含めずに指定してください。
+
+`caption_separator` 指定時は、`undesired_tags`、`always_first_tags` は `caption_separator`  で区切ってください。`tag_replacement` は必ず `,` で区切ってください。
+
diff --git a/finetune/tag_images_by_wd14_tagger.py b/finetune/tag_images_by_wd14_tagger.py