Log Likelihood Ratio (LLR) Statistic
A Log Likelihood Ratio (LLR) Statistic is a statistical measure based on the log likelihood function.
- Context:
- It can be used to calculated a Log-Likelihood Ratio Statistic.
- It can be calculated as [math]\displaystyle{ LLR = ln \Bigl{(}\frac{(P(I \vert D)/P(\sim I \vert D))}{P(I)/P(\sim I)}\Bigr{)} }[/math] where [math]\displaystyle{ P(I\vert D) }[/math] and [math]\displaystyle{ P(\sim I\vert D) }[/math] are the frequencies of interactions observed in the given dataset (D) between annotated genes sharing benchmark associations (I) and not sharing associations (~I), respectively, while P(I) and P(~I) represent the prior expectations (the total frequencies of all benchmark genes sharing the same associations and not sharing associations, respectively).
- Example(s):
- Counter-Example(s):
- See: Maximum-Likelihood Estimation, Expected Maximum Log Likelihood Estimation.
References
2014
- http://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood
- For many applications, the natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.
For example, some likelihood functions are for the parameters that explain a collection of statistically independent observations. In such a situation, the likelihood function factors into a product of individual likelihood functions. The logarithm of this product is a sum of individual logarithms, and the derivative of a sum of terms is often easier to compute than the derivative of a product. In addition, several common distributions have likelihood functions that contain products of factors involving exponentiation. The logarithm of such a function is a sum of products, again easier to differentiate than the original function.
- For many applications, the natural logarithm of the likelihood function, called the log-likelihood, is more convenient to work with. Because the logarithm is a monotonically increasing function, the logarithm of a function achieves its maximum value at the same points as the function itself, and hence the log-likelihood can be used in place of the likelihood in maximum likelihood estimation and related techniques. Finding the maximum of a function often involves taking the derivative of a function and solving for the parameter being maximized, and this is often easier when the function being maximized is a log-likelihood rather than the original likelihood function.
2011
- http://en.wikipedia.org/wiki/Likelihood-ratio_test
- … When the logarithm of the likelihood ratio is used, the statistic is known as a log-likelihood ratio statistic, and the probability distribution of this test statistic, assuming that the null model is true, can be approximated using Wilks' theorem. In the case of distinguishing between two models, each of which has no unknown parameters, use of the likelihood ratio test can be justified by the Neyman–Pearson lemma, which demonstrates that such a test has the highest power among all competitors.
2005
- (Ramani et al., 2005) ⇒ Arun K. Ramani, Razvan C. Bunescu, Raymond Mooney, and Edward M Marcotte. (2005). “Consolidating the Set of Known Human Protein-Protein Interactions in Preparation for Large-Scale Mapping of the Human Interactome.” In: Genome Biology, 6(5). doi:10.1186/gb-2005-6-5-r40
- QUOTE: we calculate a log likelihood ratio (LLR) as: [math]\displaystyle{ LLR = ln (\frac{P(D \vert I)} {P(D \vert \sim I)}) }[/math] where [math]\displaystyle{ P(D\vert I) }[/math] and [math]\displaystyle{ P(D\vert \sim I) }[/math] are the probability of observing the data (D) conditioned on the genes sharing benchmark associations (I) and not sharing benchmark associations (~I). By Bayes theorem, this equation can be rewritten as: [math]\displaystyle{ LLR = ln (\frac{(P(I \vert D)/P(\sim I \vert D))}{P(I)/P(\sim I)}) }[/math] where [math]\displaystyle{ P(I\vert D) }[/math] and [math]\displaystyle{ P(\sim I\vert D) }[/math] are the frequencies of interactions observed in the given dataset (D) between annotated genes sharing benchmark associations (I) and not sharing associations (~I), respectively, while P(I) and P(~I) represent the prior expectations (the total frequencies of all benchmark genes sharing the same associations and not sharing associations, respectively). This latter version of the equation is simpler to compute. A score of zero indicates interaction partners in the dataset being tested are no more likely than random to belong to the same pathway or to interact; higher scores indicate a more accurate dataset.
1992
- (Zeitouni et al., 1992) ⇒ Ofer Zeitouni, Jacob Ziv, and Neri Merhav. (1992). “When Is the Generalized Likelihood Ratio Test Optimal?.” In: IEEE Transactions on Information Theory, 38(5). doi:10.1109/18.149515