Tikhonov Regularization Algorithm
A Tikhonov Regularization Algorithm is a regularization algorithm that adds a penalty term related to the magnitude of the coefficients.
- Context:
- It can (typically) stabilize the solutions of ill-posed problems by adding a penalty term to the loss function.
- It can (often) be applied in linear inverse problems where direct solutions are unstable or non-existent.
- It can involve adding a term proportional to the square of the norm of the solution to the original minimization problem, effectively shrinking the solution towards zero.
- It can be a form of Ridge Regression in the context of linear regression, where the penalty term is proportional to the square of the magnitude of the coefficients.
- It can help avoid overfitting in statistical models by penalizing large coefficients.
- ...
- Example(s):
- One whose penalty is a Square of the Norm.
- In image processing, it can deblur images where direct inversion of the blurring operator leads to noise amplification.
- In geophysics, it can be applied for seismic tomography to obtain stable solutions from incomplete or noisy data.
- ...
- Counter-Example(s):
- LASSO (Least Absolute Shrinkage and Selection Operator), which uses an L1 penalty instead of an L2 penalty.
- Unregularized linear regression, which does not include a penalty term and can lead to overfitting with noisy or high-dimensional data.
- See: Constrained Linear Inversion Method, Discrete Fourier Transform, Ill-Conditioned Matrix, Lowpass Operator, Norm (mathematics), Phillips–Twomey Method, Reproducing Kernel Hilbert Space.
References
2012
- http://en.wikipedia.org/wiki/Tikhonov_regularization
- Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously known as the Tikhonov–Miller method, the Phillips–Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg–Marquardt algorithm for non-linear least-squares problems.
When the following problem is not well posed (either because of non-existence or non-uniqueness of [math]\displaystyle{ x }[/math]) :[math]\displaystyle{ A\mathbf{x}=\mathbf{b}, }[/math] then the standard approach is known as linear least squaresTemplate:Disambiguation needed and seeks to minimize the residual :[math]\displaystyle{ \|A\mathbf{x}-\mathbf{b}\|^2 }[/math] where [math]\displaystyle{ \left \| \cdot \right \| }[/math] is the Euclidean norm. This may be due to the system being overdetermined or underdetermined ([math]\displaystyle{ A }[/math] may be ill-conditioned or singular). In the latter case this is no better than the original problem. In order to give preference to a particular solution with desirable properties, the regularization term is included in this minimization: :[math]\displaystyle{ \|A\mathbf{x}-\mathbf{b}\|^2+ \|\Gamma \mathbf{x}\|^2 }[/math] for some suitably chosen Tikhonov matrix, [math]\displaystyle{ \Gamma }[/math]. In many cases, this matrix is chosen as the identity matrix [math]\displaystyle{ \Gamma= I }[/math], giving preference to solutions with smaller norms. In other cases, lowpass operators (e.g., a difference operator or a weighted Fourier operator) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous.
This regularization improves the conditioning of the problem, thus enabling a numerical solution. An explicit solution, denoted by [math]\displaystyle{ \hat{x} }[/math], is given by: :[math]\displaystyle{ \hat{x} = (A^{T}A+ \Gamma^{T} \Gamma )^{-1}A^{T}\mathbf{b} }[/math] The effect of regularization may be varied via the scale of matrix [math]\displaystyle{ \Gamma }[/math]. For [math]\displaystyle{ \Gamma = 0 }[/math] this reduces to the unregularized least squares solution provided that (ATA)−1 exists.
- Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and, with multiple independent discoveries, it is also variously known as the Tikhonov–Miller method, the Phillips–Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg–Marquardt algorithm for non-linear least-squares problems.
2004
- (Rifkin & Klatau, 2004) ⇒ Ryan Rifkin, and Aldebaro Klautau. (2004). “In Defense of One-Vs-All Classification.” In: The Journal of Machine Learning Research, 5.
- QUOTE: A broad class of classification (and regression) algorithms can be derived from the general approach of Tikhonov regularization. In our case, we consider Tikhonov regularization in a reproducing kernel Hilbert space:
1998
- (Cowan, 1998) ⇒ Glen Cowan. (1998). “Statistical Data Analysis." Oxford University Press, ISBN:0198501560
- QUOTE: A commonly used measure of smoothness is the mean value of the square of some derivative of the true distribution. This technique was suggested independently by Phillips [Phi62] and Tikhonov [Tik63, Tik77], and is usually called Tikhonov regularization. If we consider the p.d.f. [math]\displaystyle{ f_{true}(y) }[/math] before being discretized as a histogram, then the regularization function is :[math]\displaystyle{ S[f_{true}(y)] = - ... (11.41) }[/math] where the integration is over all allowed values of [math]\displaystyle{ y }[/math]. The minus sign comes from the convention taken here that we maximize [math]\displaystyle{ \phi }[/math] as defined by (11.40). That is, greater [math]\displaystyle{ S }[/math] corresponds to more smoothness. (Equivalently one can of course minimize a combination of regularization and log-likelihood functions with the opposite sign; this convention as well is often encountered in the literature.)