2020 RevisitingFewSampleBERTFineTuni
- (Zhang, Wu et al., 2020) ⇒ Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q Weinberger, and Yoav Artzi. (2020). “Revisiting Few-sample BERT Fine-tuning.” In: arXiv preprint arXiv:2006.05987. doi:10.48550/arXiv.2006.05987
Subject Headings: Fine-Tuned BERT Text Classification Algorithm.
Notes
- Fine-tuning Instability in Few-Sample Scenarios: The paper addresses the challenge of instability when fine-tuning BERT in few-sample scenarios, identifying main causes like biased gradient estimation due to a non-standard BERTAdam optimization, the limited utility of some BERT layers for downstream tasks, and the fixed small number of training iterations.
- Optimization Algorithm - Debiasing Omission in BERTAdam: The study emphasizes the instability caused by omitting bias correction in the BERTAdam optimization method and shows that reintroducing bias correction can significantly stabilize the fine-tuning process in few-sample scenarios
- Initialization - Re-initializing BERT Pre-trained Layers: Proposes that re-initializing the top layers of BERT, which might be less applicable for the target task, can lead to enhanced performance and quicker convergence during fine-tuning
- Training Iterations - Fine-tuning BERT for Longer: Advocates for extending the number of fine-tuning training iterations beyond the standard recommendation of three epochs, to improve both the stability and performance of fine-tuned models
- Evaluation of Existing Few-Sample Fine-Tuning Methods: Re-evaluates existing methods aimed at stabilizing few-sample BERT fine-tuning, finding their effectiveness diminished with the application of the fine-tuning optimizations proposed in the paper
- Recommendations for Future Practices: Offers recommendations such as debiasing the optimization method, re-evaluating the necessity of transferring all pre-trained layers, increasing training iterations, and conducting a moderate number of random trials for improved fine-tuning outcomes
Cited By
Quotes
Abstract
This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2020 RevisitingFewSampleBERTFineTuni | Tianyi Zhang Felix Wu Kilian Q Weinberger Yoav Artzi Arzoo Katiyar | Revisiting Few-sample BERT Fine-tuning | 10.48550/arXiv.2006.05987 | 2020 |