Masked Language Model (MLM) Training Algorithm

References

chat
- Masked Language Modeling (MLM) refers to a training technique or objective used in pretraining language models, rather than a specific model type. It's a way to train a language model by randomly masking some words in the input sequence and having the model predict the masked words based on the context provided by the unmasked words.
  This training technique allows the model to learn bidirectional representations of the input text, as it can use both the left and right context to predict the masked word. BERT (Bidirectional Encoder Representations from Transformers) is a popular example of a language model that uses the MLM training objective.

(Bao et al., 2020) ⇒ Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, et al. (2020). “Unilmv2: Pseudo-masked Language Models for Unified Language Model Pre-training.” In: International conference on machine learning, pp. 642-652 . PMLR,

(Conneau et al., 2020) ⇒ Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. (2020). “Unsupervised Cross-lingual Representation Learning at Scale.” In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (ACL-2020).
- QUOTE: ... The goal of this paper is to improve cross-lingual language understanding (XLU), by carefully studying the effects of training unsupervised crosslingual representations at a very large scale. We present XLM-R a transformer-based multilingual masked language model pre-trained on text in 100 languages, which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.