Skip to content

Parameters used to get 94.714 top5 with efficientnet_b2? #27

Closed
@DannyHernandez

Description

@DannyHernandez

Hey! I'm a researcher at OpenAI looking into trends in compute used by models. I'm excited to find this repo, since it's the only one with EfficientNet that claims to approximately reproduce the original performance.

I've got two runs going on machines with 8 P100's

./distributed_train.sh 8 /tmp/imagenet-extracted/ --model efficientnet_b0 --lr 0.035 -b 64 --drop 0.2 --img-size 224 --sched step --epochs 550 --decay-epochs 2 --decay-rate 0.975 --opt rmsproptf -j 8 --warmup-epochs 5 --warmup-lr 1e-6 --weight-decay 1e-5 --opt-eps .001 --model-ema

./distributed_train.sh 8 /tmp/imagenet-extracted/ --model efficientnet_b2 --lr 0.0175 -b 32 --drop 0.2 --img-size 224 --sched step --epochs 550 --decay-epochs 2 --decay-rate 0.975 --opt rmsproptf -j 8 --warmup-epochs 5 --warmup-lr 1e-6 --weight-decay 1e-5 --opt-eps .001 --model-ema

Where the only change I made from the parameters recommended here was scaling the learning rate you used, .27 based on the difference in batch size

I'd be very interested in what specific learning rate plus other hyperparameters you used in your efficientnet-b2 run referenced in the ReadMe, and what has worked out best in b0 runs, since the learning rate above was given for a model family rather than b0 specifically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions