Masked Language Model (MLM) Training Algorithm
(Redirected from MLM)
Jump to navigation
Jump to search
A Masked Language Model (MLM) Training Algorithm is a LM training algorithm that ...
- Example(s):
- BERT Algorithm (BERT).
- ...
- See: Autoregressive Language Modeling.
References
2023
- chat
- Masked Language Modeling (MLM) refers to a training technique or objective used in pretraining language models, rather than a specific model type. It's a way to train a language model by randomly masking some words in the input sequence and having the model predict the masked words based on the context provided by the unmasked words.
This training technique allows the model to learn bidirectional representations of the input text, as it can use both the left and right context to predict the masked word. BERT (Bidirectional Encoder Representations from Transformers) is a popular example of a language model that uses the MLM training objective.
- Masked Language Modeling (MLM) refers to a training technique or objective used in pretraining language models, rather than a specific model type. It's a way to train a language model by randomly masking some words in the input sequence and having the model predict the masked words based on the context provided by the unmasked words.
2020
- (Bao et al., 2020) ⇒ Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, et al. (2020). “Unilmv2: Pseudo-masked Language Models for Unified Language Model Pre-training.” In: International conference on machine learning, pp. 642-652 . PMLR,
2020
- (Schick & Schütze, 2020) ⇒ Timo Schick, and Hinrich Schütze. (2020). “It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.” In: arXiv preprint arXiv:2009.07118.
- QUOTE: ... Our modified version of PET uses masked language models (Devlin et al., 2019) to assign probabilities to sequences of text; this is similar to using them in a generative fashion (Wang and Cho, 2019) and has previously been investigated by Salazar et al. (2020) and Ghazvininejad et al. (2019). ...
2020
- (Conneau et al., 2020) ⇒ Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. (2020). “Unsupervised Cross-lingual Representation Learning at Scale.” In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, (ACL-2020).
- QUOTE: ... The goal of this paper is to improve cross-lingual language understanding (XLU), by carefully studying the effects of training unsupervised crosslingual representations at a very large scale. We present XLM-R a transformer-based multilingual masked language model pre-trained on text in 100 languages, which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.
2019
- (Devlin et al., 2019) ⇒ Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). DOI:10.18653/v1/N19-1423. arXiv:1810.04805