File tree 1 file changed +28
-0
lines changed 1 file changed +28
-0
lines changed Original file line number Diff line number Diff line change @@ -236,3 +236,31 @@ python ./run_squad.py \
236
236
--gradient_accumulation_steps 2 \
237
237
--optimize_on_cpu
238
238
```
239
+
240
+ If you have a recent GPU (starting from NVIDIA Volta series), you should try ** 16-bit fine-tuning** (FP16).
241
+
242
+ Here is an example of hyper-parameters for a FP16 run we tried:
243
+ ``` bash
244
+ python ./run_squad.py \
245
+ --vocab_file $BERT_LARGE_DIR /vocab.txt \
246
+ --bert_config_file $BERT_LARGE_DIR /bert_config.json \
247
+ --init_checkpoint $BERT_LARGE_DIR /pytorch_model.bin \
248
+ --do_lower_case \
249
+ --do_train \
250
+ --do_predict \
251
+ --train_file $SQUAD_TRAIN \
252
+ --predict_file $SQUAD_EVAL \
253
+ --learning_rate 3e-5 \
254
+ --num_train_epochs 2 \
255
+ --max_seq_length 384 \
256
+ --doc_stride 128 \
257
+ --output_dir $OUTPUT_DIR \
258
+ --train_batch_size 24 \
259
+ --fp16 \
260
+ --loss_scale 128
261
+ ```
262
+
263
+ The results were similar to the above FP32 results (actually slightly higher):
264
+ ``` bash
265
+ {" exact_match" : 84.65468306527909, " f1" : 91.238669287002}
266
+ ```
You can’t perform that action at this time.
0 commit comments