Bidirectional Language Model (BiLM)

From GM-RKB

(Redirected from Bi-Directional Language Model)

Jump to navigation Jump to search

A Bidirectional Language Model (BiLM) is a language model that is a bidirectional sequence model.

Context:
- It can range from being a Word/Token-level BiLM to being a Character-Level BiLM.
See: Unidirectional LSTM.

References

2018

(Devlin et al., 2018) ⇒ Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” In: arXiv preprint arXiv:1810.04805.
- QUOTE: ... We argue that current techniques severely restrict the power of the pre-trained representations, especially for the fine-tuning approaches. The major limitation is that standard language models are unidirectional, and this limits the choice of architectures that can be used during pre-training. For example, in OpenAI GPT, the authors use a left-to-right architecture, where every token can only attended to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such restrictions are sub-optimal for sentence-level tasks, and could be devastating when applying fine-tuning based approaches to token-level tasks such as SQuAD question answering (Rajpurkar et al., 2016), where it is crucial to incorporate context from both directions.

2017

(Peters et al., 2017) ⇒ Matthew E. Peters, Waleed Ammar, Chandra Bhagavatula, and Russell Power. (2017). “Semi-supervised Sequence Tagging with Bidirectional Language Models.” arXiv preprint arXiv:1705.00108

2013

(Graves et al., 2013) ⇒ Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. (2013). “Hybrid Speech Recognition with Deep Bidirectional LSTM.” In: Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU-2013).
- QUOTE: ... Deep Bidirectional LSTM (DBLSTM) recurrent neural networks have recently been shown to give state-of-the-art performance on the TIMIT speech database. …

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Bidirectional_Language_Model_(BiLM)&oldid=681073"

Concept