2003 ANeuralProbabilisticLanguageMod

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Distributional Word Representation, Neural Probabilistic Language Model.

Notes

Cited By

2018

2013

Quotes

Abstract

A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

References

BibTeX

@article{2003_ANeuralProbabilisticLanguageMod,
  author    = {Yoshua Bengio and
               Rejean Ducharme and
               Pascal Vincent and
               Christian Janvin},
  title     = {A Neural Probabilistic Language Model},
  journal   = {Journal of Machine Learning Research},
  volume    = {3},
  pages     = {1137--1155},
  year      = {2003},
  url       = {http://jmlr.org/papers/v3/bengio03a.html},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 ANeuralProbabilisticLanguageModYoshua Bengio
Pascal Vincent
Christian Janvin
Réjean Ducharme
A Neural Probabilistic Language Model2003