Bidirectional Language Model (BiLM)
Jump to navigation
Jump to search
A Bidirectional Language Model (BiLM) is a language model that is a bidirectional sequence model.
- Context:
- It can range from being a Word/Token-level BiLM to being a Character-Level BiLM.
- See: Unidirectional LSTM.
References
2018
- (Devlin et al., 2018) ⇒ Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In: arXiv preprint arXiv:1810.04805.
- QUOTE: ... We argue that current techniques severely restrict the power of the pre-trained representations, especially for the fine-tuning approaches. The major limitation is that standard language models are unidirectional, and this limits the choice of architectures that can be used during pre-training. For example, in OpenAI GPT, the authors use a left-to-right architecture, where every token can only attended to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such restrictions are sub-optimal for sentence-level tasks, and could be devastating when applying fine-tuning based approaches to token-level tasks such as SQuAD question answering (Rajpurkar et al., 2016), where it is crucial to incorporate context from both directions.
2017
- (Peters et al., 2017) ⇒ Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. (2017). “Semi-supervised Sequence Tagging with Bidirectional Language Models.” arXiv preprint arXiv:1705.00108
2013
- (Graves et al., 2013) ⇒ Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. (2013). “Hybrid Speech Recognition with Deep Bidirectional LSTM.” In: Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-2013).
- QUOTE: ... Deep Bidirectional LSTM (DBLSTM) recurrent neural networks have recently been shown to give state-of-the-art performance on the TIMIT speech database. …