Neural Network-based Language Model (NLM)
(Redirected from neural language modeling (LM))
Jump to navigation
Jump to search
A Neural Network-based Language Model (NLM) is a language model that is neural text-to-text sequence model.
- Context:
- It can be produced by a Neural Language Modeling System (that can solve a neural LM training task).
- It can range from (typically) being a Pretrained Neural Language Model (LM) to being an Untretrained Neural Language Model (LM).
- It can range from being a Character-Level Neural Network-based LM to being a Word/Token-Level Neural Network-based LM.
- It can range from being a Forward Neural Network-based Language Model to being a Backward Neural Network-based Language Model to being a Bi-Directional Neural Network-based Language Model.
- It can range from (typically) being a Deep NNet-based LM (such as a large NLM) to being a Shallow NNet-based LM.
- It can range from being a Uni-Lingual NLM to being a Multi-Lingual NLM.
- …
- Example(s):
- a Bigram Neural Language Model (previous word is used to predict the current word).
- an RNN-based Language Model such as:
- an Transformer-based Language Model such as: GPT-2, BERT-based model, ELMo, and Turing-NLG.
- an Universal Language Model Fine-tuning for Text Classification (ULMFiT).
- …
- Counter-Example(s):
- a Text-Substring Probability Function,
- an N-Gram Language Model,
- an Exponential Language Model,
- a Cache Language Model (Jelinek et al., 1991),
- a Bag-Of-Concepts Model (Cambria & Hussain, 2012),
- a Positional Language Model (Lv & Zhai, 2009),
- a Structured Language Model (Chelba and Jelinek, 2000),
- a Random Forest Language Model (Xu, 2005),
- a Bayesian Language Model (Teh, 2006),
- a Class-based Language Model (Brown et al., 1992),
- a Maximum Likelihood-based Language Model (Goldberg, 2015),
- a Query Likelihood Model,
- a Factored Language Model.
- See: Language Modeling Task, Language Modeling System, Natural Language Representation Dataset, Language Modeling Benchmark, Artificial Neural Network, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Inference Task.
References
2017
- (Daniluk et al., 2017) ⇒ Michał Daniluk, Tim Rocktaschel, Johannes Welbl, and Sebastian Riedel. (2017). “Frustratingly Short Attention Spans in Neural Language Modeling.” In: Proceedings of ICLR 2017.
- QUOTE: Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid - and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step.
2015
- (Karpathy, 2015) ⇒ Andrej Karpathy. (2015). “The Unreasonable Effectiveness of Recurrent Neural Networks.” In: Proceedings of Blog post 2015-05-21.
- QUOTE: ... By the way, together with this post I am also releasing code on Github that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. ...
2003
- (Bengio et al., 2003a) ⇒ Yoshua Bengio, R. Ducharme, Vincent, P., and C. Jauvin. (2003). “A Neural Probabilistic Language Model.” In: Journal of Machine Learning Research, 3(6).
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. … We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.