Skip to content

Commit c7b2808

Browse files
committed
Update LM finetuning README to include a literature reference
1 parent 7c59e32 commit c7b2808

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/lm_finetuning/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ The three example scripts in this folder can be used to **fine-tune** a pre-trai
66

77
The [ULMFiT paper](https://arxiv.org/abs/1801.06146) took a slightly different approach, however, and added an intermediate step in which the model is fine-tuned on text **from the same domain as the target task and using the pretraining objective** before the final stage in which the classifier head is added and the model is trained on the target task itself. This paper reported significantly improved results from this step, and found that they could get high-quality classifications even with only tiny numbers (<1000) of labelled training examples, as long as they had a lot of unlabelled data from the target domain.
88

9-
The BERT model has more capacity than the LSTM models used in the ULMFiT work, but the [BERT paper](https://arxiv.org/abs/1810.04805) did not test finetuning using the pretraining objective and at the present stage there aren't many examples of this approach being used for Transformer-based language models. As such, it's hard to predict what effect this step will have on final model performance, but it's reasonable to conjecture that this approach can improve the final classification performance, especially when a large unlabelled corpus from the target domain is available, labelled data is limited, or the target domain is very unusual and different from 'normal' English text. If you are aware of any literature on this subject, please feel free to add it in here, or open an issue and tag me (@Rocketknight1) and I'll include it.
9+
Although this wasn't covered in the original BERT paper, domain-specific fine-tuning of Transformer models has [recently been reported by other authors](https://arxiv.org/pdf/1905.05583.pdf), and they report performance improvements as well.
1010

1111
## Input format
1212

0 commit comments

Comments
 (0)