Semi-Supervised Learning Algorithm
(Redirected from supervised learning algorithms that combine labeled and unlabeled data)
Jump to navigation
Jump to search
A Semi-Supervised Learning Algorithm is a supervised learning algorithm that can be applied by a semi-supervised learning system to solve a semi-supervised training task that involves both labeled training data and unlabeled training data.
- Context:
- It can range from being a Transductive Semi-Supervised Learning Algorithm to being a Inductive Semi-Supervised Learning Algorithm, depending on whether the test set is visible during learning.
- It can range from being a Semi-Supervised Classification Algorithm to being a Semi-Supervised Regression Algorithm to being a Semi-Supervised Point Estimation Algorithm.
- It can be a Self-Supervised Learning Algorithm when a Labeling Heuristic is available to generate pseudo-labels for the unlabeled data.
- It can be a Co-Training Learning Algorithm, if multiple views of the data are available and can be independently used to train separate classifiers which are then combined.
- It can range from being a Weakly-Trained Learning Algorithm, relying on minimal labeled data, to being a Large Labeled Dataset Semi-Supervised Algorithm that utilizes substantially more labeled information.
- It can benefit from incorporating Domain Knowledge and Constraint-based Approaches to guide the learning process with unlabeled data.
- It can leverage Data Augmentation and Feature Learning to improve generalization from limited labeled data.
- ...
- Example(s):
- A Semi-Supervised Generative Model, which models the distribution of both labeled and unlabeled data.
- A Co-training Learning Algorithm, which alternates between training on separate feature sets and enlarging the labeled set with newly labeled data from the learned classifier.
- A Semi-Supervised Support Vector Machines (S3VM) Algorithm, which extends SVMs to handle unlabeled data.
- A Semi-Supervised CRF Training Algorithm, which uses labeled and unlabeled sequences for conditional random field training.
- A Semi-Supervised Graph-based Learning Algorithm, which propagates labels through a graph connecting data points based on their similarity.
- A EM With Generative Mixture Models Algorithm, which alternates between assigning pseudo-labels to unlabeled data and reestimating model parameters.
- A Transductive Support Vector Machine, optimized for a specific set of test data by using both labeled and unlabeled data.
- A Multiview Learning Algorithm, which uses different representations (views) of the data to improve learning.
- ...
- Counter-Example(s):
- A Fully-Supervised Learning Algorithm, which requires a completely labeled training set for learning.
- A Unsupervised Learning Algorithm, such as Clustering Algorithms, which do not utilize any labels for learning.
- A Reinforcement Learning Algorithm, which interacts with an environment to learn policies rather than learning from labeled and unlabeled data.
- ...
- See: Metric-based Learning, Few-Shot Learning, Transfer Learning.
References
2009
- (Zhu et al., 2009) ⇒ Xiaojin Zhu, and Andrew B. Goldberg. (2009). “Introduction to Semi-Supervised Learning.” In: Morgan and Claypool Publishers. ISBN: 1598295470.
2008
- (Zhu, 2008) ⇒ Xiaojin Zhu. (2008). “Semi-Supervised Learning Literature Survey (revised edition)." Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison.
- QUOTE: Some often-used methods include: EM with generative mixture models, self-training, co-training, transductive support vector machines, and graph-based methods.
2007
- (Zhu, 2007) ⇒ Xiaojin Zhu. (2007). “Semi-Supervised Learning." Tutorial at ICML 2007.
- input instance x, label y
- learner f : X -> Y
- labeled data (X_l, Y_l) = {(x_1:l, y_1:l)}
- unlabeled data X_u = {x_l+1:n}, available during training
- usually l << n
- test data X_test = {x_n+1:}, not available during training
2006
- (Chapelle et al., 2006a) ⇒ Olivier Chapelle (editor), Alexander Zien (editor), and Bernhard Schölkopf (editor). (2006). “Semi-Supervised Learning.” MIT Press. ISBN:0262033585
- (Chapelle et al., 2006b) ⇒ Olivier Chapelle, Alexander Zien, and Bernhard Schölkopf. (2006). “Introduction to Semi-Supervised Learning.” In: (Chapelle et al., 2006a)
2006
- (Chapelle et al., 2006b) ⇒ Olivier Chapelle, Alexander Zien, and Bernhard Schölkopf (Editors). (2006). “Introduction to Semi-Supervised Learning.” In: (Chapelle et al., 2006a)
- QUOTE: A problem related to SSL was introduced by Vapnik already several decades ago: so-called transductive learning. In this setting, one is given a (labeled) training set and an (unlabeled) test set. The idea of transduction is to perform predictions only for the test points. This is in contrast to inductive learning, where the goal is to output a prediction function which is defined on the entire space X. Many methods described in this book will be transductive; in particular, this is rather natural for inference based on graph representations of the data. This issue will be addressed again in section 1.2.4.
2005
- (Zhu, 2005) ⇒ Xiaojin Zhu. (2005). “Semi-supervised learning literature survey." Technical Report TR-1530. University of Wisconsin-Madison Department of Computer Science.
2004
- (Basu et al., 2004) ⇒ Sugato Basu, Mikhail Bilenko, and Raymond Mooney. (2004). “A Probabilistic Framework for Semi-Supervised Clustering.” In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004). doi:10.1145/1014052.1014062
2001
- (Seeger, 2001) ⇒ Matthias Seeger. (2001). “Learning with Labeled and Unlabeled Data." Technical Report. University of Edinburgh.
1999
- (Joachims, 1999) ⇒ Thorsten Joachims. (1999). “Transductive Inference for Text Classification using Support Vector Machines.” In: Proceedings of the International Conference on Machine Learning (ICML 1999).
- QUOTE: The work presented here tackles the problem of learning from small training samples by taking a transductive (Vapnik, 1998), instead of an inductive approach. In the inductive setting the learner tries to induce a decision function which has a low error rate on the whole distribution of examples for the particular learning task. Often, this setting is unnecessarily complex. In many situations we do not care about the particular decision function, but rather that we classify a given set of examples (i.e. a test set) with as few errors as possible. This is the goal of transductive inference. Some examples of transductive text classification tasks are the following. All have in common that there is little training data, but a very large test set.