Maximum Entropy Model NLP Algorithm
See: NLP Algorithm; Maximum Entropy Model Algorithm; Maximum Entropy Markov Model, Log-linear Model.
References
2017a
- (Sammut & Webb, 2017) ⇒ Claude Sammut, and Geoffrey I. Webb. (2017). “Maxent Models” In: (Sammut & Webb, 2017).
2017b
- (Ratnaparkhi, 2017) ⇒ Adwait Ratnaparkhi (2017). "Maximum Entropy Models for Natural Language Processing". In: (Sammut & Webb, 2017).
- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
The information-theoretic notion of entropy is a way to quantify the uncertainty of a probability model; higher entropy corresponds to more uncertainty in the probability distribution. The rationale for choosing the maximum entropy model – from the set of models that meet the evidence – is that any other model assumes evidence that has not been observed (Jaynes 1957).
In most natural language processing problems, observed evidence takes the form of co-occurrence counts between some prediction of interest and some linguistic context of interest. These counts are derived from a large number of linguistically annotated examples, known as a corpus. For example, the frequency in a large corpus with which the word that co-occurs with the tag corresponding to determiner, or DET, is a piece of observed evidence. A probability model is consistent with the observed evidence if its calculated estimates of the co-occurrence counts agree with the observed counts in the corpus.
The goal of the maximum entropy framework is to find a model that is consistent with the co-occurrence counts, but is otherwise maximally uncertain. It provides a way to combine many pieces of evidence into a single probability model. An iterative parameter estimation procedure is usually necessary in order to find the maximum entropy probability model.
- QUOTE: The term maximum entropy refers to an optimization framework in which the goal is to find the probability model that maximizes entropy over the set of models that are consistent with the observed evidence.
1957
- (Jaynes, 1957) ⇒ E. T. Jaynes (1957). "Information theory and statistical mechanics". Phys Rev 106(4):620–630