Description
Hey! I'm a researcher at OpenAI looking into trends in compute used by models. I'm excited to find this repo, since it's the only one with EfficientNet that claims to approximately reproduce the original performance.
I've got two runs going on machines with 8 P100's
./distributed_train.sh 8 /tmp/imagenet-extracted/ --model efficientnet_b0 --lr 0.035 -b 64 --drop 0.2 --img-size 224 --sched step --epochs 550 --decay-epochs 2 --decay-rate 0.975 --opt rmsproptf -j 8 --warmup-epochs 5 --warmup-lr 1e-6 --weight-decay 1e-5 --opt-eps .001 --model-ema
./distributed_train.sh 8 /tmp/imagenet-extracted/ --model efficientnet_b2 --lr 0.0175 -b 32 --drop 0.2 --img-size 224 --sched step --epochs 550 --decay-epochs 2 --decay-rate 0.975 --opt rmsproptf -j 8 --warmup-epochs 5 --warmup-lr 1e-6 --weight-decay 1e-5 --opt-eps .001 --model-ema
Where the only change I made from the parameters recommended here was scaling the learning rate you used, .27
based on the difference in batch size
I'd be very interested in what specific learning rate plus other hyperparameters you used in your efficientnet-b2 run referenced in the ReadMe, and what has worked out best in b0 runs, since the learning rate above was given for a model family rather than b0 specifically.