RoBERTa System
(Redirected from RoBERTa)
Jump to navigation
Jump to search
A RoBERTa System is a BERT system produced by facebook AI.
- Example(s):
- Counter-Example(s):
- See: Transformer Network, NLP, LLM.
References
2019b
- Andrew Ng. (2019). “Cropped Roberta - BERT Is Back."
- QUOTE: ... RoBERTa uses the BERT LARGE configuration (355 million parameters) with an altered pretraining pipeline. ... the following changes:
- Increased training data size from 16Gb to 160Gb by including three additional datasets.
- Boosted batch size from 256 sequences to 8,000 sequences per batch.
- Raised the number of pretraining steps from 31,000 to 500,000.
- Removed the next sentence prediction (NSP) loss term from the training objective and used full-sentence sequences as input instead of segment pairs.
- Fine-tuned for two of the nine tasks in the GLUE natural language understanding benchmark as well as for SQuAD (question answering) and RACE (reading comprehension).
- QUOTE: ... RoBERTa uses the BERT LARGE configuration (355 million parameters) with an altered pretraining pipeline. ... the following changes:
2019a
- (Liu et al., 2019) ⇒ Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. (2019). “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” In: CoRR, abs/1907.11692.
- QUOTE: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. ... We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. ...