Maximum Entropy Markov Model (MEMM)
(Redirected from maximum entropy Markov models (MEMMs))
Jump to navigation
Jump to search
A Maximum Entropy Markov Model (MEMM) is a discriminative maximum entropy Markov Model that …
- AKA: Conditional Markov Model (CMM).
- Context:
- It can be instantiated as a … (Finite-State Sequence Tagging Model).
- It can be trained by a MEMM Training System (that implements a MEMM training algorithm).
- …
- Counter-Example(s):
- See: Markov Random Field, Logistic Regression Algorithm, Label-Bias Problem.
References
2005
- (Jie Tang, 2005) ⇒ Jie Tang. (2005). “An Introduction for Conditional Random Fields." Literature Survey ¨C 2, Dec, 2005, at Tsinghua.
2003
- (Zelenko et al., 2003) ⇒ Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. (2003). “Kernel Methods for Relation Extraction.” In: Journal of Machine Learning Research, 3.
- QUOTE: MEMMs are able to model more complex transition and emission probability distributions and take into account various text features.
2001
- (Lafferty et al., 2001) ⇒ John D. Lafferty, Andrew McCallum, and Fernando Pereira. (2001). “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In: Proceedings of ICML 2001.
- QUOTE: … avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states.
2000
- (McCallum et al., 2000a) ⇒ Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). “Maximum Entropy Markov Models for Information Extraction and Segmentation.” In: Proceedings of ICML-2000.
- This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We present positive experimental results on the segmentation of FAQ’s.