From 9cdddacc5d014dc6a16890b9aec5b46bb2f84350 Mon Sep 17 00:00:00 2001
From: Yiwen Song <34639474+sallysyw@users.noreply.github.com>
Date: Fri, 10 Dec 2021 13:13:24 -0800
Subject: [PATCH] Update readme.md with ViT training command

As titled.
---
 references/classification/README.md | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/references/classification/README.md b/references/classification/README.md
index 20d4f04d512..ff5371066d2 100644
--- a/references/classification/README.md
+++ b/references/classification/README.md
@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).
 
-### Medium models
+#### Medium models
 ```
 torchrun --nproc_per_node=8 train.py\
      --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.
 
-### Large models
+#### Large models
 ```
 torchrun --nproc_per_node=8 train.py\
      --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
 
+### Vision Transformer
+
+#### Base models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_b_16` and `vit_b_32`.
+
+#### Large models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_l_16` and `vit_l_32`.
+
 ## Mixed precision training
 Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).