co-EM Algorithm
A co-EM Algorithm is a semi-supervised cotraining algorithm that is an iterative EM algorithm.
- …
- Counter-Example(s):
- See: Naive Bayes.
References
2006
- (Ghani et al., 2006) ⇒ Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema, and Andrew Fano. (2006). “Text Mining for Product Attribute Extraction.” In: ACM SIGKDD Explorations Newsletter Journal, 8(1). doi:10.1145/1147234.1147241
- QUOTE: The availability of a small amount of labeled training data and a large amount of unlabeled data allows us to use the semi-supervised learning setting. We use the multi-view or co-training [2] setting, where each example can be described by multiple views (e.g., the word itself and the context in which it occurs). The specific algorithm we use is co-EM: a multi-view semi-supervised learning algorithm, proposed by Nigam & Ghani [13], that combines features from both co-training [2] and EM. co-EM is iterative, like EM, but uses the feature split present in the data, like co-training. The separation into feature sets we used is that of the word to be classified and the context in which it occurs. co-EM with Naıve Bayes has been applied to classification, e.g., by [13], but so far as we are aware, not in the context of information extraction.
co-EM is a multi-view algorithm, and requires two views for each learning example. Each word or phrase is expressed in view1 by the stemmed word or phrase itself, and the parts of speech as assigned by the Brill tagger. The view2 for this data item is a context of window size 8, i.e. up to 4 words (plus parts of speech) before and up to 4 words (plus parts of speech) after the word or phrase in view1. co-EM proceeds by initializing the view1 classifier using the labeled data only. Then this classifier is used to probabilistically label all the unlabeled data. The context (view2) classifier is then trained using the original labeled data plus the unlabeled data with the labels provided by the view1 classifier. Similarly, the view2 classifier then relabels the data for use by the view1 classifier, and this process iterates for a number of iterations or until the classifiers converge.
- QUOTE: The availability of a small amount of labeled training data and a large amount of unlabeled data allows us to use the semi-supervised learning setting. We use the multi-view or co-training [2] setting, where each example can be described by multiple views (e.g., the word itself and the context in which it occurs). The specific algorithm we use is co-EM: a multi-view semi-supervised learning algorithm, proposed by Nigam & Ghani [13], that combines features from both co-training [2] and EM. co-EM is iterative, like EM, but uses the feature split present in the data, like co-training. The separation into feature sets we used is that of the word to be classified and the context in which it occurs. co-EM with Naıve Bayes has been applied to classification, e.g., by [13], but so far as we are aware, not in the context of information extraction.
2000
- (Nigam & Ghani, 2000) ⇒ Kamal Nigam, and Rayid Ghani. (2000). “Analyzing the Effectiveness and Applicability of Co-training.” In: Proceedings of the ninth International Conference on Information and knowledge management (CIKM 2000). doi:10.1145/354756.354805