Text-String Probability Function Training Task: Difference between revisions

Latest revision as of 02:46, 24 September 2021

A Text-String Probability Function Training Task is a probability function generation task that requires the creation of a text-string probability function structure.

AKA: Statistical Language Modeling, LM.
Context:
- Performance: a Perplexity Measure, ...
- It can range from (typically) being a Data-Driven Language Modeling Task to being a Heuristic Language Modeling Task.
- It can range from being a Character-level Language Modeling Task to being a Word-level Language Modeling Task.
- It can be solved by a Language Modeling System (that implements a language modeling algorithm).
- It can include a Language Model Evaluation Task.
- …
Counter-Example(s):
- Document Modeling.
- Word Vector Space Modeling Task.
See: n-Gram, Word Embedding Task.

References

2013

(Collins, 2013a) ⇒ Michael Collins. (2013). “Chapter 1 - Language Modeling." Course notes for NLP by Michael Collins, Columbia University.
- QUOTE: Definition 1 (Language Model) A language model consists of a finite set [math]\displaystyle{ \mathcal{V} }[/math], and afunction [math]\displaystyle{ p(x_1, x_2, ... x_n) }[/math] such that:
  1. For any [math]\displaystyle{ \lt x_1 ... x_n\gt \in \mathcal{V}^{\dagger}, p(x_1,x_2,... x_n) \ge 0 }[/math]
  2. In addition, [math]\displaystyle{ \Sigma_{\lt x_1 ... x+n\gt } \in \mathcal{V}^{\dagger} p(x1; x2, ... xn) = 1 }[/math]
- Hence [math]\displaystyle{ p(x_1,x_2,... x_n) }[/math] is a probability distribution over the sentences in [math]\displaystyle{ \mathcal{V}^{\dagger} }[/math].

2003

(Bengio et al., 2003a) ⇒ Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. (2003). “A Neural Probabilistic Language Model.” In: The Journal of Machine Learning Research, 3.
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language.

2001

(Goodman, 2001) ⇒ Joshua T. Goodman. (2001). “A Bit of Progress in Language Modeling.” In: Computer Speech & Language, 15(4). doi:10.1006/csla.2001.0174
- QUOTE: The goal of a language model is to determine the probability of a word sequence [math]\displaystyle{ w_1...w_n, P (w_1...w_n) }[/math]. This probability is typically broken down into its component probabilities: : [math]\displaystyle{ P (w_1...w_i) = P (w_1) × P (w_2 \mid w_1) ×... × P (w_i \mid w_1...w_{i−1}) }[/math] Since it may be difficult to compute a probability of the form [math]\displaystyle{ P(w_i \mid w_1...w_{i−1}) }[/math] for large i, we typically assume that the probability of a word depends on only the two previous words, the trigram assumption: : [math]\displaystyle{ P (w_i \mid w_1...w_{i−1}) ≈ P (w_i \mid w_i−2w_{i−1}) }[/math] which has been shown to work well in practice.

@@ Line 1: / Line 1: @@
-A [[Text-String Probability Function Training Task]] is a [[probability function generation task]] that requires the creation of a [[text string probability function structure]].
+A [[Text-String Probability Function Training Task]] is a [[probability function generation task]] that requires the creation of a [[text-string probability function structure]].
-* <B>AKA:</B> [[Statistical Language Modeling]].
+* <B>AKA:</B> [[Text-String Probability Function Training Task|Statistical Language Modeling]], [[LM]].
 * <B>Context:</B>
-** It can range from being a [[Heuristic Language Modeling Task]] to being a [[Data-Driven Language Modeling Task]].
+** [[Task Performance Measure|Performance]]: a [[Perplexity Measure]], ...
-** It can be solved by a [[Language Modeling System]] (that applies a [[language modeling algorithm]]).
+** It can range from (typically) being a [[Data-Driven Language Modeling Task]] to being a [[Heuristic Language Modeling Task]].
+** It can range from being a [[Character-level Language Modeling Task]] to being a [[Word-level Language Modeling Task]].
+** It can be solved by a [[Language Modeling System]] (that implements a [[language modeling algorithm]]).
+** It can include a [[Language Model Evaluation Task]].
+** …
 * <B>Counter-Example(s):</B>
 ** [[Document Modeling]].
 ** [[Word Vector Space Modeling Task]].
-* <B>See:</B> [[n-Gram]].
+* <B>See:</B> [[n-Gram]], [[Word Embedding Task]].
 ----
 ----
-==References==
+== References ==
 === 2013 ===
-* ([[Collins, 2013a]]) &rArr; [[Michael Collins]]. (2013). "[http://www.cs.columbia.edu/~mcollins/lm-spring2013.pdf Chapter 1 - Language Modeling]." Course notes for NLP by Michael Collins, Columbia University.
+* ([[Collins, 2013a]]) ⇒ [[Michael Collins]]. ([[2013]]). “[http://www.cs.columbia.edu/~mcollins/lm-spring2013.pdf Chapter 1 - Language Modeling]." Course notes for NLP by Michael Collins, Columbia University.
-** QUOTE: Definition 1 ([[Language Model]]) A [[language model]] consists of a [[finite set]] <math>\mathcal{V}</math>, and a </i>[[vector function|function]]</i> <math>p(x_1, x_2, ... x_n)</math> such that:
+** QUOTE: Definition 1 ([[Language Model]]) A [[language model]] consists of a [[finite set]] <math>\mathcal{V}</math>, and a</i>[[vector function|function]]</i> <math>p(x_1, x_2, ... x_n)</math> such that:
-**# For any <math><x_1 ...x_n> \in \mathcal{V}^{\dagger}, p(x_1,x_2,... x_n) \ge 0</math>
+**# For any <math>\lt x_1 ... x_n> \in \mathcal{V}^{\dagger}, p(x_1,x_2,... x_n) \ge 0</math>
-**# In addition, <math>\Sigma_{<x_1 ... x+n>} \in \mathcal{V}^{\dagger} p(x1; x2, ... xn) = 1</math>
+**# In addition, <math>\Sigma_{\lt x_1 ... x+n>} \in \mathcal{V}^{\dagger} p(x1; x2, ... xn) = 1</math>
 ** Hence <math>p(x_1,x_2,... x_n)</math> is a [[probability distribution]] over the [[sentence]]s in <math>\mathcal{V}^{\dagger}</math>.
 === 2003 ===
-* ([[2003_ANeuralProbabilisticLanguageMod|Bengio et al., 2003a]]) &rArr; [[Yoshua Bengio]], [[Réjean Ducharme]], [[Pascal Vincent]], and [[Christian Janvin]]. ([[2003]]). “[http://jmlr.org/papers/volume3/tmp/bengio03a.pdf  A Neural Probabilistic Language Model]." In: The Journal of Machine Learning Research, 3.
+* ([[2003_ANeuralProbabilisticLanguageMod|Bengio et al., 2003a]]) ⇒ [[Yoshua Bengio]], [[Réjean Ducharme]], [[Pascal Vincent]], and [[Christian Janvin]]. ([[2003]]). “[http://jmlr.org/papers/volume3/tmp/bengio03a.pdf  A Neural Probabilistic Language Model].” In: The Journal of Machine Learning Research, 3.
-** QUOTE: A goal of [[statistical language modeling]] is to [[learn]] the [[joint probability function of sequences of words in a language]].
+** QUOTE: A goal of [[Text-String Probability Function Training Task|statistical language modeling]] is to [[learn]] the [[joint probability function of sequences of words in a language]].
 === 2001 ===
-* ([[2001_ABitofProgressinLanguageModelin|Goodman, 2001]]) ⇒ [[Joshua T. Goodman]]. ([[2001]]). “[http://research.microsoft.com/en-us/um/redmond/groups/srg/papers/2001-joshuago-tr72.pdf A Bit of Progress in Language Modeling]." In: Computer Speech & Language, 15(4). [http://dx.doi.org/10.1006/csla.2001.0174 doi:10.1006/csla.2001.0174]
+* ([[2001_ABitofProgressinLanguageModelin|Goodman, 2001]]) ⇒ [[Joshua T. Goodman]]. ([[2001]]). “[http://research.microsoft.com/en-us/um/redmond/groups/srg/papers/2001-joshuago-tr72.pdf A Bit of Progress in Language Modeling].” In: Computer Speech & Language, 15(4). [http://dx.doi.org/10.1006/csla.2001.0174 doi:10.1006/csla.2001.0174]
-** QUOTE: The [[task goal|goal]] of a [[language model]] is to determine the [[probability]] of a [[word sequence]] <math>w_1...w_n, P (w_1...w_n)</math>. </s> This [[probability]] is typically broken down into its [[component probabiliti]]es: : <math>P (w_1...w_i) = P (w_1) × P (w_2 \mid w_1) ×... × P (w_i \mid w_1...w_{i−1}) </math> Since it may be [[difficult to compute]] a [[probability value|probability]] of the form <math>P(w_i \mid w_1...w_{i−1})</math> for large i, we typically assume that the [[probability of a word]] depends on only the [[two previous words]], the [[trigram assumption]]: : <math>P (w_i \mid w_1...w_{i−1}) ≈ P (w_i \mid w_i−2w_{i−1})  </math> which has been shown to work well in practice.
+** QUOTE: The [[task goal|goal]] of a [[language model]] is to determine the [[probability]] of a [[word sequence]] <math>w_1...w_n, P (w_1...w_n)</math>. </s> This [[probability]] is typically broken down into its [[component probabiliti]]es: : <math>P (w_1...w_i) = P (w_1) × P (w_2 \mid w_1) ×... × P (w_i \mid w_1...w_{i−1}) </math> Since it may be [[difficult to compute]] a [[probability value|probability]] of the form <math>P(w_i \mid w_1...w_{i−1})</math> for large i, we typically assume that the [[probability of a word]] depends on only the [[two previous words]], the [[trigram assumption]]: : <math>P (w_i \mid w_1...w_{i−1}) ≈ P (w_i \mid w_i−2w_{i−1})</math>  which has been shown to work well in practice.
 ----
 __NOTOC__
 [[Category:Concept]]

Text-String Probability Function Training Task: Difference between revisions

Latest revision as of 02:46, 24 September 2021

References

2013

2003

2001

Navigation menu

Search