Supervised Linear Model-based Classification Algorithm
A Supervised Linear Model-based Classification Algorithm is a supervised model-based classification algorithm that fits a linear classification function.
- Context:
- It can (often) be applied by a Linear Classification System (to solve a linear classification task).
- It can (often) be a Baseline Supervised Classification Algorithm.
- It can (often) be a Simple Supervised Classification Algorithm.
- It can (often) be a Interpretable Supervised Classification Algorithm.
- It can (often) be a Computationally Efficient Supervised Classification Algorithm.
- It can (often) require careful Linear Model-based Feature Engineering.
- It can range from being a Discriminative Linear Classification Algorithm to being a Generative Linear Classification Algorithm.
- It can range from being a Gradient Descent-based Linear Classification Algorithm to being an Analytic Solution-based Linear Classification Algorithm, to optimize the linear function.
- ...
- Example(s):
- a Linear SVM Algorithm, that maximizes the margin of a linear kernel (so that the distance to the closest misclassified entity is the widest)
- a Logistic Regression Algorithm, that minimizes the classification error (based on the sum of differences).
- a Perceptron Algorithm, that iteratively adjusts weights based on classification errors.
- a Ridge Regression Classifier, that includes regularization to handle multicollinearity.
- a Naive Bayes Classifier with a Gaussian assumption, which can approximate linear decision boundaries.
- ...
- Counter-Example(s):
- a Linear Regression Algorithm.
- a Non-Linear Classification Algorithm, such as a Kernel SVM Algorithm or Neural Network.
- a Decision Tree Training Algorithm, which uses a hierarchical decision structure.
- See: Fisher's Linear Discriminant, Linear Combination, Margin Classifier, Maximum-Margin Classifier, Linear Decision Boundary.
References
2024
- (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Linear_classifier Retrieved:2024-1-15.
- In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to. A linear classifier achieves this by making a classification decision based on the value of a linear combination of the characteristics. An object's characteristics are also known as feature values and are typically presented to the machine in a vector called a feature vector. Such classifiers work well for practical problems such as document classification, and more generally for problems with many variables (features), reaching accuracy levels comparable to non-linear classifiers while taking less time to train and use.5-12-23
2024
- (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Linear_classifier#Generative_models_vs Retrieved:2024-1-15.
- There are two broad classes of methods for determining the parameters of a linear classifier [math]\displaystyle{ \vec w }[/math] . They can be generative and discriminative models. [1] [2] Methods of the former model joint probability distribution, whereas methods of the latter model conditional density functions [math]\displaystyle{ P({\rm class}|\vec x) }[/math] . Examples of such algorithms include: * Linear Discriminant Analysis (LDA)—assumes Gaussian conditional density models * Naive Bayes classifier with multinomial or multivariate Bernoulli event models. The second set of methods includes discriminative models, which attempt to maximize the quality of the output on a training set. Additional terms in the training cost function can easily perform regularization of the final model. Examples of discriminative training of linear classifiers include: * Logistic regression—maximum likelihood estimation of [math]\displaystyle{ \vec w }[/math] assuming that the observed training set was generated by a binomial model that depends on the output of the classifier.
- Perceptron—an algorithm that attempts to fix all errors encountered in the training set
- Fisher's Linear Discriminant Analysis—an algorithm (different than "LDA") that maximizes the ratio of between-class scatter to within-class scatter, without any other assumptions. It is in essence a method of dimensionality reduction for binary classification. [3]
- Support vector machine—an algorithm that maximizes the margin between the decision hyperplane and the examples in the training set.
- Note: Despite its name, LDA does not belong to the class of discriminative models in this taxonomy. However, its name makes sense when we compare LDA to the other main linear dimensionality reduction algorithm: principal components analysis (PCA). LDA is a supervised learning algorithm that utilizes the labels of the data, while PCA is an unsupervised learning algorithm that ignores the labels. To summarize, the name is a historical artifact. [4] Discriminative training often yields higher accuracy than modeling the conditional density functions. However, handling missing data is often easier with conditional density models.
All of the linear classifier algorithms listed above can be converted into non-linear algorithms operating on a different input space [math]\displaystyle{ \varphi(\vec x) }[/math] , using the kernel trick.
- There are two broad classes of methods for determining the parameters of a linear classifier [math]\displaystyle{ \vec w }[/math] . They can be generative and discriminative models. [1] [2] Methods of the former model joint probability distribution, whereas methods of the latter model conditional density functions [math]\displaystyle{ P({\rm class}|\vec x) }[/math] . Examples of such algorithms include: * Linear Discriminant Analysis (LDA)—assumes Gaussian conditional density models * Naive Bayes classifier with multinomial or multivariate Bernoulli event models. The second set of methods includes discriminative models, which attempt to maximize the quality of the output on a training set. Additional terms in the training cost function can easily perform regularization of the final model. Examples of discriminative training of linear classifiers include: * Logistic regression—maximum likelihood estimation of [math]\displaystyle{ \vec w }[/math] assuming that the observed training set was generated by a binomial model that depends on the output of the classifier.
2015
- http://spark.apache.org/docs/latest/mllib-linear-methods.html
- QUOTE: Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function f that depends on a variable vector w (called weights in the code), which has d entries. Formally, we can write this as the optimization problem [math]\displaystyle{ minw∈Rdf(w) }[/math], where the objective function is of the form: [math]\displaystyle{ f(w):=λR(w)+1n∑i=1nL(w;xi,yi) .(1) }[/math] Here the vectors [math]\displaystyle{ x_i∈Rd }[/math] are the training data examples, for 1≤i≤n, and yi∈R are their corresponding labels, which we want to predict. We call the method linear if L(w;x,y) can be expressed as a function of wTx and y. Several of MLlib’s classification and regression algorithms fall into this category, and are discussed here.
The objective function f has two parts: the regularizer that controls the complexity of the model, and the loss that measures the error of the model on the training data. The loss function L(w;.) is typically a convex function in w. The fixed regularization parameter λ≥0 (regParam in the code) defines the trade-off between the two goals of minimizing the loss (i.e., training error) and minimizing model complexity (i.e., to avoid overfitting).
- QUOTE: Many standard machine learning methods can be formulated as a convex optimization problem, i.e. the task of finding a minimizer of a convex function f that depends on a variable vector w (called weights in the code), which has d entries. Formally, we can write this as the optimization problem [math]\displaystyle{ minw∈Rdf(w) }[/math], where the objective function is of the form: [math]\displaystyle{ f(w):=λR(w)+1n∑i=1nL(w;xi,yi) .(1) }[/math] Here the vectors [math]\displaystyle{ x_i∈Rd }[/math] are the training data examples, for 1≤i≤n, and yi∈R are their corresponding labels, which we want to predict. We call the method linear if L(w;x,y) can be expressed as a function of wTx and y. Several of MLlib’s classification and regression algorithms fall into this category, and are discussed here.
2011
- http://en.wikipedia.org/wiki/Linear_classifier
- … If the input feature vector to the classifier is a real vector [math]\displaystyle{ \vec x }[/math], then the output score is [math]\displaystyle{ y = f(\vec{w}\cdot\vec{x}) = f\left(\sum_j w_j x_j\right), }[/math] where [math]\displaystyle{ \vec w }[/math] is a real vector of weights and [math]\displaystyle{ f }[/math] is a function that converts the dot product of the two vectors into the desired output. (In other words, [math]\displaystyle{ \vec{w} }[/math] is a one-form or linear functional mapping [math]\displaystyle{ \vec x }[/math] onto R.) The weight vector [math]\displaystyle{ \vec w }[/math] is learned from a set of labeled training samples. Often [math]\displaystyle{ f }[/math] is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex [math]\displaystyle{ f }[/math] might give the probability that an item belongs to a certain class.
2009
- (Hastie et al., 2009) ⇒ Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. (2009). “The Elements of Statistical Learning: Data Mining, Inference, and Prediction; 2nd edition.” Springer-Verlag. ISBN:0387848576
2006
- (Bishop, 2006) ⇒ Christopher M. Bishop. (2006). “Pattern Recognition and Machine Learning. Springer, Information Science and Statistics.
2004
- (Bouchard & Triggs, 2004) ⇒ Guillaume Bouchard, and Bill Triggs. (2004). “The Trade-off Between Generative and Discriminative Classifiers.” In: Proceedings of COMPSTAT 2004.
1995
- (Li, 1995) ⇒ Stan Z. Li. (1995). “Markov Random Field Modeling in Computer Vision.” Springer-Verlag. ISBN:4431701451
- 7.2.3 Linear Classification Function http://www.cbsr.ia.ac.cn/users/szli/mrf_book/Chapter_7/node120.html
- ↑ T. Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Draft Version, 2005
- ↑ A. Y. Ng and M. I. Jordan. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. in NIPS 14, 2002.
- ↑ R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", Wiley, (2001).
- ↑ R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", Wiley, (2001).