Statistical Learning Framework
A Statistical Learning Framework is a Machine Learning Framework that combines statistics and functional analysis.
- AKA: Statistical Learning Theory.
- Example(s):
- Counter-Example(s):
- See: Machine Learning, Statistical Learning Theory Discipline, Statistics, Functional Analysis, Statistical Inference Theory, Dimensionality Reduction, Artificial Neural Network, Clustering Algorithm, Structured Prediction, Anomaly Detection, Rademacher Complexity, Bioinformatics, Computer Vision, Speech Recognition.
References
2020a
- (Wikipedia, 2020a) ⇒ https://en.wikipedia.org/wiki/Statistical_learning_theory Retrieved:2020-2-1.
- Statistical learning theory is a framework for machine learning
drawing from the fields of statistics and functional analysis. [1] Statistical learning theory deals with the problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics.
- Statistical learning theory is a framework for machine learning
- ↑ Trevor Hastie, Robert Tibshirani, Jerome Friedman (2009) The Elements of Statistical Learning, Springer-Verlag .
2020b
- (Wikipedia, 2020b) ⇒ https://en.wikipedia.org/wiki/Statistical_learning_theory#Formal_description Retrieved:2020-2-1.
- Take [math]\displaystyle{ X }[/math] to be the vector space of all possible inputs, and [math]\displaystyle{ Y }[/math] to be
the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space [math]\displaystyle{ Z = X \times Y }[/math] , i.e. there exists some unknown [math]\displaystyle{ p(z) = p(\vec{x},y) }[/math] . The training set is made up of [math]\displaystyle{ n }[/math] samples from this probability distribution, and is notated :
[math]\displaystyle{ S = \{(\vec{x}_1,y_1), \dots ,(\vec{x}_n,y_n)\} = \{\vec{z}_1, \dots ,\vec{z}_n\} }[/math]
Every [math]\displaystyle{ \vec{x}_i }[/math] is an input vector from the training data, and [math]\displaystyle{ y_i }[/math] is the output that corresponds to it.
In this formalism, the inference problem consists of finding a function [math]\displaystyle{ f: X \to Y }[/math] such that [math]\displaystyle{ f(\vec{x}) \sim y }[/math] . Let [math]\displaystyle{ \mathcal{H} }[/math] be a space of functions [math]\displaystyle{ f: X \to Y }[/math] called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let [math]\displaystyle{ V(f(\vec{x}),y) }[/math] be the loss function, a metric for the difference between the predicted value [math]\displaystyle{ f(\vec{x}) }[/math] and the actual value [math]\displaystyle{ y }[/math] . The expected risk is defined to be :
[math]\displaystyle{ I[f] = \displaystyle \int_{X \times Y} V(f(\vec{x}),y)\, p(\vec{x},y) \,d\vec{x} \,dy }[/math]
The target function, the best possible function [math]\displaystyle{ f }[/math] that can be
chosen, is given by the [math]\displaystyle{ f }[/math] that satisfies :
[math]\displaystyle{ f = \inf_{h \in \mathcal{H}} I[h] }[/math]
Because the probability distribution [math]\displaystyle{ p(\vec{x},y) }[/math] is unknown, a
proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk :
[math]\displaystyle{ I_S[f] = \frac{1}{n} \displaystyle \sum_{i=1}^n V( f(\vec{x}_i),y_i) }[/math]
A learning algorithm that chooses the function [math]\displaystyle{ f_S }[/math] that minimizes
the empirical risk is called empirical risk minimization.
- Take [math]\displaystyle{ X }[/math] to be the vector space of all possible inputs, and [math]\displaystyle{ Y }[/math] to be
2009
- (Lafferty & Wasserman, 2009) ⇒ John D. Lafferty, and Larry Wasserman. (2009). “Statistical Machine Learning - Course: 10-702." Spring 2009, Carnegie Mellon Institute.
- Statistical Machine Learning is a second graduate level course in machine learning, assuming students have taken Machine Learning (10-701) and Intermediate Statistics (36-705). The term "statistical" in the title reflects the emphasis on statistical analysis and methodology, which is the predominant approach in modern machine learning.
- Yoav Freund (2009). “Statistical Machine Learning (Boosting). Course: UC San Diego, CSE254, Winter 2009. http://seed.ucsd.edu/mediawiki/index.php/CSE254
2008
- (Hutter, 2008) ⇒ Marcus Hutter (2008). "Introduction to Statistical Machine Learning". In: Machine Learning Summer School (MLSS-2008).
- QUOTE: This course provides a broad introduction to the methods and practice of statistical machine learning, which is concerned with the development of algorithms and techniques that learn from observed data by constructing stochastic models that can be used for making predictions and decisions. Topics covered include Bayesian inference and maximum likelihood modeling; regression, classification, density estimation, clustering, principal component analysis; parametric, semi-parametric, and non-parametric models; basis functions, neural networks, kernel methods, and graphical models; deterministic and stochastic optimization; overfitting, regularization, and validation.
2007
- (Berkeley Univerisyt, 2007) ⇒ http://www.stat.berkeley.edu/~statlearning/
- Statistical machine learning merges statistics with the computational sciences --- computer science, systems science and optimization. Much of the agenda in statistical machine learning is driven by applied problems in science and technology, where data streams are increasingly large-scale, dynamical and heterogeneous, and where mathematical and algorithmic creativity are required to bring statistical methodology to bear. Fields such as bioinformatics, artificial intelligence, signal processing, communications, networking, information management, finance, game theory and control theory are all being heavily influenced by developments in statistical machine learning.
- The field of statistical machine learning also poses some of the most challenging theoretical problems in modern statistics, chief among them being the general problem of understanding the link between inference and computation.