2020 RevisitingFewSampleBERTFineTuni

(Zhang, Wu et al., 2020) ⇒ Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q Weinberger, and Yoav Artzi. (2020). “Revisiting Few-sample BERT Fine-tuning.” In: arXiv preprint arXiv:2006.05987. doi:10.48550/arXiv.2006.05987

Subject Headings: Fine-Tuned BERT Text Classification Algorithm.

Notes

Fine-tuning Instability in Few-Sample Scenarios: The paper addresses the challenge of instability when fine-tuning BERT in few-sample scenarios, identifying main causes like biased gradient estimation due to a non-standard BERTAdam optimization, the limited utility of some BERT layers for downstream tasks, and the fixed small number of training iterations.
Optimization Algorithm - Debiasing Omission in BERTAdam: The study emphasizes the instability caused by omitting bias correction in the BERTAdam optimization method and shows that reintroducing bias correction can significantly stabilize the fine-tuning process in few-sample scenarios
Initialization - Re-initializing BERT Pre-trained Layers: Proposes that re-initializing the top layers of BERT, which might be less applicable for the target task, can lead to enhanced performance and quicker convergence during fine-tuning
Training Iterations - Fine-tuning BERT for Longer: Advocates for extending the number of fine-tuning training iterations beyond the standard recommendation of three epochs, to improve both the stability and performance of fine-tuned models
Evaluation of Existing Few-Sample Fine-Tuning Methods: Re-evaluates existing methods aimed at stabilizing few-sample BERT fine-tuning, finding their effectiveness diminished with the application of the fine-tuning optimizations proposed in the paper
Recommendations for Future Practices: Offers recommendations such as debiasing the optimization method, re-evaluating the necessity of transferring all pre-trained layers, increasing training iterations, and conducting a moderate number of random trials for improved fine-tuning outcomes

Cited By

http://scholar.google.com/scholar?q=%222020%22+Revisiting+Few-sample+BERT+Fine-tuning

Quotes

Abstract

This paper is a study of fine-tuning of BERT contextual representations, with focus on commonly observed instabilities in few-sample scenarios. We identify several factors that cause this instability: the common use of a non-standard optimization method with biased gradient estimation; the limited applicability of significant parts of the BERT network for down-stream tasks; and the prevalent practice of using a pre-determined, and small number of training iterations. We empirically test the impact of these factors, and identify alternative practices that resolve the commonly observed instability of the process. In light of these observations, we re-visit recently proposed methods to improve few-sample fine-tuning with BERT and re-evaluate their effectiveness. Generally, we observe the impact of these methods diminishes significantly with our modified process.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2020 RevisitingFewSampleBERTFineTuni	Tianyi Zhang Felix Wu Kilian Q Weinberger Yoav Artzi Arzoo Katiyar			Revisiting Few-sample BERT Fine-tuning				10.48550/arXiv.2006.05987		2020

2020 RevisitingFewSampleBERTFineTuni

Notes

Cited By

Quotes

Abstract

References

Navigation menu

Search