Hinge-Loss Function
A Hinge-Loss Function is a non-differentiable smooth convex loss function that ...
- Context:
- It can be defined as:
- [math]\displaystyle{ \ell(y) = \max(0, 1-t \cdot y) }[/math].
- [math]\displaystyle{ V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = |1 - yf(\vec{x}) |_{+} }[/math]
- …
- It can be defined as:
- Counter-Example(s):
- See: Hyperplane, Hinge Loss MRF, Support Vector Machine Training Algorithm.
References
2019
- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Loss_functions_for_classification#Hinge_loss Retrieved:2019-9-30.
- The hinge loss function is defined as : [math]\displaystyle{ V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = |1 - yf(\vec{x}) |_{+}. }[/math] The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function when [math]\displaystyle{ \operatorname{sgn}(f(\vec{x})) = y }[/math] and [math]\displaystyle{ |yf(\vec{x})| \geq 1 }[/math] . In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for support vector machines (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.
While the hinge loss function is both convex and continuous, it is not smooth (is not differentiable) at [math]\displaystyle{ yf(\vec{x})=1 }[/math] . Consequently, the hinge loss function cannot be used with gradient descent methods or stochastic gradient descent methods which rely on differentiability over the entire domain. However, the hinge loss does have a subgradient at [math]\displaystyle{ yf(\vec{x})=1 }[/math] , which allows for the utilization of subgradient descent methods. SVMs utilizing the hinge loss function can also be solved using quadratic programming.
The minimizer of [math]\displaystyle{ I[f] }[/math] for the hinge loss function is :[math]\displaystyle{ f^*_\text{Hinge}(\vec{x}) \;=\; \begin{cases} 1& \text{if }p(1\mid\vec{x}) \gt p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) \lt p(-1\mid\vec{x}) \end{cases} }[/math]
when [math]\displaystyle{ p(1\mid x) \ne 0.5 }[/math] , which matches that of the 0–1 indicator function. This conclusion makes the hinge loss quite attractive, as bounds can be placed on the difference between expected risk and the sign of hinge loss function. The Hinge loss cannot be derived from (2) since [math]\displaystyle{ f^*_{\text{Hinge}} }[/math] is not invertible.
- The hinge loss function is defined as : [math]\displaystyle{ V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = |1 - yf(\vec{x}) |_{+}. }[/math] The hinge loss provides a relatively tight, convex upper bound on the 0–1 indicator function. Specifically, the hinge loss equals the 0–1 indicator function when [math]\displaystyle{ \operatorname{sgn}(f(\vec{x})) = y }[/math] and [math]\displaystyle{ |yf(\vec{x})| \geq 1 }[/math] . In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for support vector machines (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Hinge_loss Retrieved:2018-1-24.
- In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).
For an intended output t ±1 and a classifier score, the hinge loss of the prediction is defined as : [math]\displaystyle{ \ell(y) = \max(0, 1-t \cdot y) }[/math] Note that should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, [math]\displaystyle{ y = \mathbf{w} \cdot \mathbf{x} + b }[/math] , where [math]\displaystyle{ (\mathbf{w},b) }[/math] are the parameters of the hyperplane and [math]\displaystyle{ \mathbf{x} }[/math] is the point to classify.
It can be seen that when and have the same sign (meaning predicts the right class) and [math]\displaystyle{ |y| \ge 1 }[/math], the hinge loss [math]\displaystyle{ \ell(y) = 0 }[/math] , but when they have opposite sign, [math]\displaystyle{ \ell(y) }[/math] increases linearly with (one-sided error).
- In machine learning, the hinge loss is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).
2017a
- https://qr.ae/TWsitr
- QUOTE: ... This definition of “best” results in different loss functions. If you look at the optimization problems of linear SVM and (regularized) LR, they are very similar:
- [math]\displaystyle{ min_𝑤 𝜆‖𝑤‖^2 + ∑_𝑖 max\{0,1 − 𝑦_𝑖 𝑤^𝑇 𝑥_𝑖\} }[/math]
- [math]\displaystyle{ min_𝑤 𝜆‖𝑤‖^2 + ∑_𝑖 log(1+exp(1 − 𝑦_𝑖 𝑤^𝑇 𝑥_𝑖)) }[/math]
- That is, they only differ in the loss function — SVM minimizes hinge loss while logistic regression minimizes logistic loss. …
- QUOTE: ... This definition of “best” results in different loss functions. If you look at the optimization problems of linear SVM and (regularized) LR, they are very similar: