Self-Training Algorithm

From GM-RKB
Jump to navigation Jump to search

A Self-Training Algorithm is a Semi-Supervised Learning Algorithm that makes use of an existing Predictive Model to extract more Training Cases from an Unlabeled Dataset.



References

2007

  • (Zhu, 2007) ⇒ Xiaojin Zhu. (2007). “Semi-Supervised Learning." Tutorial at ICML 2007.
    • Self-training algorithm
      • Assumption: One’s own high confidence predictions are correct.
      • Self-training algorithm:
        • 1 Train f from (Xl, Yl)
        • 2 Predict on x 2 Xu
        • 3 Add (x, f(x)) to labeled data.
        • 4 Repeat
    • Variations in self-training
    • Advantages of self-training
      • The simplest semi-supervised learning method.
      • A wrapper method, applies to existing (complex) classifiers.
      • Often used in real tasks like natural language processing.
    • Disadvantages of self-training
      • Early mistakes could reinforce themselves.
        • Heuristic solutions, e.g. “un-label” an instance if its confidence falls below a threshold.
      • Cannot say too much in terms of convergence.
        • But there are special cases when self-training is equivalent to the Expectation-Maximization (EM) algorithm.
        • There are also special cases (e.g., linear functions) when the closed-form solution is known.

2006

2004

1997