Influential Observation

An Influential Observation is an observation which removal or omission from the dataset will significantly alter the outcome of the parameter estimation task.

AKA: DFBETA.
Example(s)
- DFFITS.
See: Regression Analysis Task, Parameter Estimation, DFFITS.

References

2015

(Wikipedia, 2015) ⇒ http://www.wikiwand.com/en/Influential_observation Retrieved 2016-07-24
- In statistics, an influential observation is an observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation. In particular, in regression analysis an influential point is one whose deletion has a large effect on the parameter estimates.

AssessmentVarious methods have been proposed for measuring influence. Assume an estimated regression [math]\displaystyle{ \mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e} }[/math], where [math]\displaystyle{ \mathbf{y} }[/math] is an n×1 column vector for the response variable, [math]\displaystyle{ \mathbf{X} }[/math] is the n×k design matrix of explanatory variables (including a constant), [math]\displaystyle{ \mathbf{e} }[/math] is the n×1 residual vector, and [math]\displaystyle{ \mathbf{b} }[/math] is a k×1 vector of estimates of some population parameter [math]\displaystyle{ \mathbf{\beta} \in \mathbb{R}^{k} }[/math]. Also define [math]\displaystyle{ \mathbf{H} \equiv \mathbf{X} \left(\mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsf{T}} }[/math], the projection matrix of [math]\displaystyle{ \mathbf{X} }[/math]. Then we have the following measures of influence:

[math]\displaystyle{ \text{DFBETA}_{i} \equiv \mathbf{b} - \mathbf{b}_{(-i)} = \frac{\left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{x}_{i}^{\mathsf{T}} e_{i}}{1 - h_{i}} }[/math], where [math]\displaystyle{ \mathbf{b}_{(-i)} }[/math] denotes the coefficients estimated with the i-th row [math]\displaystyle{ \mathbf{x}_{i} }[/math] of [math]\displaystyle{ \mathbf{X} }[/math] deleted, [math]\displaystyle{ h_{i} = \mathbf{x}_{i} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{x}_{i}^{\mathsf{T}} }[/math] denotes the i-th row of [math]\displaystyle{ \mathbf{H} }[/math]. Thus DFBETA measures the difference in each parameter estimate with and without the influential point. There is a DFBETA for each point and each observation (if there are N points and k variables there are N·k DFBETAs).
DFFITS
Cook's D measures the effect of removing a data point on all the parameters combined.

Influential Observation

References

2015

Navigation menu

Search