From 9cdddacc5d014dc6a16890b9aec5b46bb2f84350 Mon Sep 17 00:00:00 2001 From: Yiwen Song <34639474+sallysyw@users.noreply.github.com> Date: Fri, 10 Dec 2021 13:13:24 -0800 Subject: [PATCH] Update readme.md with ViT training command As titled. --- references/classification/README.md | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/references/classification/README.md b/references/classification/README.md index 20d4f04d512..ff5371066d2 100644 --- a/references/classification/README.md +++ b/references/classification/README.md @@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\ ``` Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678). -### Medium models +#### Medium models ``` torchrun --nproc_per_node=8 train.py\ --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\ @@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\ ``` Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`. -### Large models +#### Large models ``` torchrun --nproc_per_node=8 train.py\ --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\ @@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\ ``` Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`. +### Vision Transformer + +#### Base models +``` +torchrun --nproc_per_node=8 train.py\ + --model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\ + --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\ + --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\ + --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema +``` +Here `$MODEL` is one of `vit_b_16` and `vit_b_32`. + +#### Large models +``` +torchrun --nproc_per_node=8 train.py\ + --model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\ + --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\ + --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\ + --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema +``` +Here `$MODEL` is one of `vit_l_16` and `vit_l_32`. + ## Mixed precision training Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).