RoBERTa System

From GM-RKB

Jump to navigation Jump to search

A RoBERTa System is a BERT system produced by facebook AI.

Example(s):
- https://pytorch.org/hub/pytorch_fairseq_roberta/
- ...
Counter-Example(s):
- BERT System,
- DeBERTa System.
See: Transformer Network, NLP, LLM.

References

2019b

Andrew Ng. (2019). “Cropped Roberta - BERT Is Back."
- QUOTE: ... RoBERTa uses the BERT LARGE configuration (355 million parameters) with an altered pretraining pipeline. ... the following changes:
  - Increased training data size from 16Gb to 160Gb by including three additional datasets.
  - Boosted batch size from 256 sequences to 8,000 sequences per batch.
  - Raised the number of pretraining steps from 31,000 to 500,000.
  - Removed the next sentence prediction (NSP) loss term from the training objective and used full-sentence sequences as input instead of segment pairs.
  - Fine-tuned for two of the nine tasks in the GLUE natural language understanding benchmark as well as for SQuAD (question answering) and RACE (reading comprehension).

2019a

(Liu et al., 2019) ⇒ Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. (2019). “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” In: CoRR, abs/1907.11692.
- QUOTE: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. ... We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. ...

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=RoBERTa_System&oldid=881128"

Facts

... more about "RoBERTa System"

Yinhan Liu +, Myle Ott +, Naman Goyal +, Jingfei Du +, Mandar Joshi +, Danqi Chen +, Omer Levy +, Mike Lewis +, Luke Zettlemoyer + and Veselin Stoyanov +