Log-Likelihood Function
A log-likelihood function is a natural logarithm of a likelihood function.
- Context:
- It can range from being a Pseudo Log-Likelihood Function to being ...
- It can range from being an Abstract Log-Likelihood Function to being a Software-based Log-Likelihood Function.
- It can range from being a Discrete Log-Likelihood Function to being a Continuous Log-Likelihood Function.
- It can be simpler to compute and more Numerically Stable than the Likelihood Function.
- Example(s):
- …
- Counter-Example(s):
- See: Log-Likelihood Ratio Test.
References
2016
- https://github.com/ASIDataScience/training-neural-networks-notebook/blob/master/Training-Neural-Networks-Theano.ipynb
- QUOTE: In most machine learning applications, it is better to maximize the log-likelihood rather than the likelihood. This is done because the log-likelihood tends to be simpler to compute and more numerically stable than the likelihood.
2014
- http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood
- For many applications, the natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.
For example, some likelihood functions are for the parameters that explain a collection of statistically independent observations. In such a situation, the likelihood function factors into a product of individual likelihood functions. The logarithm of this product is a sum of individual logarithms, and the derivative of a sum of terms is often easier to compute than the derivative of a product. In addition, several common distributions have likelihood functions that contain products of factors involving exponentiation. The logarithm of such a function is a sum of products, again easier to differentiate than the original function.
- For many applications, the natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.
2013
- http://www.math.uah.edu/stat/point/Likelihood.html
- QUOTE: Suppose again that we have an observable random variable [math]\displaystyle{ X }[/math] for an experiment, that takes values in a set S. Suppose also that distribution of [math]\displaystyle{ X }[/math] depends on an unknown parameter θ, taking values in a parameter space Θ. Specifically, we will denote the probability density function of [math]\displaystyle{ X }[/math] on S by [math]\displaystyle{ f_θ }[/math] for θ∈Θ. Of course, our data variable [math]\displaystyle{ X }[/math] will almost always be vector-valued. The parameter θ may also be vector-valued.
The likelihood function [math]\displaystyle{ L }[/math] is the function obtained by reversing the roles of x and θ in the probability density function; that is, we view θ as the variable and x as the given information (which is precisely the point of view in estimation): [math]\displaystyle{ L_x(θ) = f_θ(x); θ∈Θ, x∈S }[/math] In the method of maximum likelihood, we try to find a value [math]\displaystyle{ u(x) }[/math] of the parameter θ that maximizes [math]\displaystyle{ L_x(θ) }[/math] for each [math]\displaystyle{ x∈S }[/math]. If we can do this, then the statistic [math]\displaystyle{ u(X) }[/math] is called a maximum likelihood estimator of θ. The method is intuitively appealing — we try to find the values of the parameters that would have most likely produced the data we in fact observed.
Since the natural logarithm function is strictly increasing on (0, ∞), the maximum value of L_x(θ), if it exists, will occur at the same points as the maximum value of [math]\displaystyle{ l_n [ L_x (θ) ] }[/math]. This latter function is called the log likelihood function and in many cases is easier to work with than the likelihood function (typically because the probability density function fθ (x) has a product structure).
- QUOTE: Suppose again that we have an observable random variable [math]\displaystyle{ X }[/math] for an experiment, that takes values in a set S. Suppose also that distribution of [math]\displaystyle{ X }[/math] depends on an unknown parameter θ, taking values in a parameter space Θ. Specifically, we will denote the probability density function of [math]\displaystyle{ X }[/math] on S by [math]\displaystyle{ f_θ }[/math] for θ∈Θ. Of course, our data variable [math]\displaystyle{ X }[/math] will almost always be vector-valued. The parameter θ may also be vector-valued.