Skip to content

Commit 2c5d993

Browse files
committed
update readme - fix SQuAD model on multi-GPU
1 parent 4850ec5 commit 2c5d993

File tree

2 files changed

+10
-3
lines changed

2 files changed

+10
-3
lines changed

README.md

+5
Original file line numberDiff line numberDiff line change
@@ -194,3 +194,8 @@ python run_squad.py \
194194
--doc_stride 128 \
195195
--output_dir ../debug_squad/
196196
```
197+
198+
Training with the previous hyper-parameters and a batch size 32 (on 4 GPUs) for 2 epochs gave us the following results:
199+
```bash
200+
{"f1": 88.19829549714827, "exact_match": 80.75685903500474}
201+
```

modeling.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -455,9 +455,11 @@ def forward(self, input_ids, token_type_ids, attention_mask, start_positions=Non
455455
end_logits = end_logits.squeeze(-1)
456456

457457
if start_positions is not None and end_positions is not None:
458-
# If we are on multi-GPU, split add a dimension - if not this is a no-op
459-
start_positions = start_positions.squeeze(-1)
460-
end_positions = end_positions.squeeze(-1)
458+
# If we are on multi-GPU, split add a dimension
459+
if len(start_positions.size()) > 1:
460+
start_positions = start_positions.squeeze(-1)
461+
if len(end_positions.size()) > 1:
462+
end_positions = end_positions.squeeze(-1)
461463
# sometimes the start/end positions are outside our model inputs, we ignore these terms
462464
ignored_index = start_logits.size(1)
463465
start_positions.clamp_(0, ignored_index)

0 commit comments

Comments
 (0)