Neyman–Pearson Lemma

A Neyman–Pearson Lemma is a statistics lemma which states that inside best critical regions (of size [math]\displaystyle{ \alpha }[/math]'s), the likelihood ratio between null and alternative hypotheses is smaller than a certain constant.

AKA: Neyman-Pearson Criterion.
Context:
- It can be generally expressed as the following statistics lemma: "if hypothesis test between two simple hypotheses H₀: θ = θ₀ and H₁: θ = θ₁, then the likelihood-ratio test which rejects H₀ in favour of H₁ when [math]\displaystyle{ \Lambda(x)=\frac{ L( \theta _0 \mid x)}{ L (\theta _1 \mid x)} \leq \eta }[/math] where [math]\displaystyle{ P(\Lambda(X)\leq \eta\mid H_0)=\alpha }[/math] is the most powerful test at significance level α for a threshold η".
See: Statistical Hypothesis Testing, Statistical Power, Uniformly Most Powerful Test, p-Value.

References

2016

(PSU online courses, 2016) ⇒ Neyman-Pearson Lemma (n.d.). Retrieved September 5, 2016, from https://onlinecourses.science.psu.edu/stat414/node/307 Copyright: 2016, The Pennsylvania State University.
- The Neyman Pearson Lemma. Suppose we have a random sample [math]\displaystyle{ X_1, X_2, \cdots, X_n }[/math] from a probability distribution with parameter [math]\displaystyle{ \theta }[/math]. Then, if C is a critical region of size [math]\displaystyle{ \alpha }[/math] and k is a constant such that:

[math]\displaystyle{ \frac{L(\theta_0)}{L(\theta_{\alpha})}\leq k \text{ inside the critical region}\; C }[/math]

and

[math]\displaystyle{ \frac{L(\theta_0)}{L(\theta_{\alpha})}\geq k \text{ outside the critical region}\; C }[/math]

then C is the best, that is, most powerful, critical region for testing the simple null hypothesis [math]\displaystyle{ H_0: \theta = \theta_0 }[/math] against the simple alternative hypothesis [math]\displaystyle{ H_A: \theta = \theta_a }[/math].

2015

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Neyman–Pearson_lemma Retrieved:2015-9-8.
- In statistics, the Neyman–Pearson lemma, named after Jerzy Neyman and Egon Pearson, states that when performing a hypothesis test between two simple hypotheses H₀: θ = θ₀ and H₁: θ = θ₁, then the likelihood-ratio test which rejects H₀ in favour of H₁ when : [math]\displaystyle{ \Lambda(x)=\frac{ L( \theta _0 \mid x)}{ L (\theta _1 \mid x)} \leq \eta }[/math] where : [math]\displaystyle{ P(\Lambda(X)\leq \eta\mid H_0)=\alpha }[/math] is the most powerful test at significance level α for a threshold η. If the test is most powerful for all [math]\displaystyle{ \theta_1 \in \Theta_1 }[/math] , it is said to be uniformly most powerful (UMP) for alternatives in the set [math]\displaystyle{ \Theta_1 \, }[/math] .
  In practice, the likelihood ratio is often used directly to construct tests — see Likelihood-ratio test. However it can also be used to suggest particular test-statistics that might be of interest or to suggest simplified tests — for this, one considers algebraic manipulation of the ratio to see if there are key statistics in it related to the size of the ratio (i.e. whether a large statistic corresponds to a small ratio or to a large one).

2004

(Scott, 2004) ⇒ Clayton Scott (2004)."The Neyman-Pearson Criterion", Retrieved September, 2016, from the OpenStax CNX website: http://cnx.org/contents/7yMVBb6e@2/The-Neyman-Pearson-Criterion Retrieved 2016-09-05, Copyright: © 1999-2016, Rice University distributed under a Creative Commons Attribution 4.0 License.
- QUOTE: The Neyman-Pearson criterion says that we should construct our decision rule to have maximum probability of detection while not allowing the probability of false alarm to exceed a certain value [math]\displaystyle{ \alpha }[/math]. In other words, the optimal detector according to the Neyman-Pearson criterion is the solution to the following constrained optimization problem:

[math]\displaystyle{ max\{P_D\}\;,\;such\; that\; P_F \leq \alpha }[/math]

The maximization is over all decision rules (equivalently, over all decision regions [math]\displaystyle{ R_0, R_1 }[/math]). Using different terminology, the Neyman-Pearson criterion selects the most powerful test of size (not exceeding) [math]\displaystyle{ \alpha }[/math].(...) Define [math]\displaystyle{ \Lambda(x)=f_1(x)/f_0(x) }[/math],(...). Let [math]\displaystyle{ \varphi }[/math] be a function of the data [math]\displaystyle{ x }[/math] with [math]\displaystyle{ \varphi(x)\in [0,1] }[/math]. [math]\displaystyle{ \varphi }[/math] defines the decision rule "declare [math]\displaystyle{ H_1 }[/math] with probability [math]\displaystyle{ \varphi(x) }[/math]” In other words, upon observing [math]\displaystyle{ x }[/math], we flip a “[math]\displaystyle{ \varphi(x) }[/math] coin." If it turns up heads, we declare [math]\displaystyle{ H_1 }[/math]; otherwise we declare [math]\displaystyle{ H_0 }[/math]. Thus far, we have only considered rules with [math]\displaystyle{ \varphi(x)\;\in\;\{0,1\} }[/math].

Neyman-Pearson Lemma: Consider the hypothesis testing problem:

[math]\displaystyle{ \mathcal{H}_0 :\; x_0\sim f_0(x) \quad and \quad \mathcal{H}_1 : \; x_1\sim f_1(x) }[/math]

where [math]\displaystyle{ f_i(x) }[/math] are both pdfs or both pmfs. Let [math]\displaystyle{ \alpha\; \in [0,1] }[/math] be the size (false-alarm probability) constraint. The decision rule

\[ \varphi(x) =

 \begin{cases}
   1       & \quad \text{if}\; \Lambda(x)>\eta\\
   \rho  & \quad \text{if}\; \Lambda(x)=\eta\\
    0 & \quad \text{if}\; \Lambda(x)<\eta
 \end{cases}

\]

is the most powerful test of size [math]\displaystyle{ \alpha }[/math], where [math]\displaystyle{ \eta }[/math] and [math]\displaystyle{ \rho }[/math] are uniquely determined by requiring [math]\displaystyle{ P_F=\alpha }[/math]. If [math]\displaystyle{ \alpha=0 }[/math], we take [math]\displaystyle{ \eta=\infty\; , \; \rho=0 }[/math]. This test is unique up to sets of probability zero under [math]\displaystyle{ H_0 }[/math] and [math]\displaystyle{ H_1 }[/math]. When [math]\displaystyle{ Pr[\Lambda(x)=\eta]\gt 0 }[/math] for certain [math]\displaystyle{ \eta }[/math], we choose [math]\displaystyle{ \eta }[/math] and [math]\displaystyle{ \rho }[/math] as follows:

[math]\displaystyle{ \begin{array}[b]{l} \text{Write:}\quad P_F=Pr[\Lambda(x)\gt \eta]+\rho\;Pr[\Lambda(x)=\eta]\\ \text{Choose}\; \eta\; \text{such that:}\quad Pr[\Lambda(x)\gt \eta]\leq \alpha \leq Pr[\Lambda(x)\geq \eta]\\ \text{Then choose}\;\rho\; \text{such that:}\quad\rho Pr[\Lambda(x)=\eta]=\alpha−Pr[\Lambda(x)\lt \eta] \end{array} }[/math]

1933

(Neyman & Pearson, 1933) ⇒ Jerzy Neyman, and Egon S. Pearson. (1933). “On the Problem of the Most Efficient Tests of Statistical Hypotheses.". In Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character Vol. 231, (pp. 289-337). Springer New York, URL: http://www.jstor.org/stable/91247, PDF File
- Suppose that for a hypothesis [math]\displaystyle{ H_t }[/math], belonging to the set of alternatives [math]\displaystyle{ \Omega }[/math],(...) may be written as [math]\displaystyle{ p_t=p_t(x_1,x_2, \cdots, x_n) }[/math]. The hypothesis of maximum likelihood, [math]\displaystyle{ H_t(\Omega \; max) }[/math], is obtained by maximising [math]\displaystyle{ p_t }[/math], with regard to these c parameters, or in fact from a solution of the equations,

[math]\displaystyle{ \frac{\partial\;p}{\partial\;\alpha^{(i)}}=0,\quad i=1,2, \cdots,c\quad\quad (30) }[/math]

The values of the [math]\displaystyle{ \alpha }[/math]'s so obtained are then substituted into [math]\displaystyle{ p }[/math] to give [math]\displaystyle{ p(\Omega\;max) }[/math]. Then the family of surfaces of constant likelihood, [math]\displaystyle{ \lambda }[/math], appropriate for testing a simple hypothesis [math]\displaystyle{ H_0 }[/math], is defined by

[math]\displaystyle{ p_0 = \lambda p(\Omega\; max.). \quad\quad(31) }[/math]

It will be seen that the members of this family are identical with the envelopes of the family

[math]\displaystyle{ p_0 =\kappa\;p_t \quad\quad(32) }[/math]

which bound the best critical regions. From this it follows that: (a) if for a given [math]\displaystyle{ \epsilon }[/math] a common best critical region exists with regard to the whole set of alternatives, it will correspond to its envelope with regard to these alternatives, and it will therefore be identical with a region bounded by a surface (31). Further, in this case, the region in which [math]\displaystyle{ \lambda\leq\lambda_0 = const }[/math] will correspond to the region in which [math]\displaystyle{ p_0=\lambda_0 p_t }[/math]. The test based upon the principle of likelihood leads, in fact, to the use of best critical regions; (b) if there is not a common best critical region, the likelihood of [math]\displaystyle{ H_0 }[/math], with regard to a particular alternative [math]\displaystyle{ H_t }[/math], will equal the constant, [math]\displaystyle{ k }[/math], of equation (32). It follows that the surface (31) upon which the likelihood of [math]\displaystyle{ H_0 }[/math], with regard to the whole set of alternatives is constant, will be the envelope of (32) for which [math]\displaystyle{ \lambda=\kappa }[/math].