Influential Observation
(Redirected from influential observation)
Jump to navigation
Jump to search
An Influential Observation is an observation which removal or omission from the dataset will significantly alter the outcome of the parameter estimation task.
- AKA: DFBETA.
- Example(s)
- See: Regression Analysis Task, Parameter Estimation, DFFITS.
References
2015
- (Wikipedia, 2015) ⇒ http://www.wikiwand.com/en/Influential_observation Retrieved 2016-07-24
- In statistics, an influential observation is an observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation. In particular, in regression analysis an influential point is one whose deletion has a large effect on the parameter estimates.
- AssessmentVarious methods have been proposed for measuring influence. Assume an estimated regression [math]\displaystyle{ \mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e} }[/math], where [math]\displaystyle{ \mathbf{y} }[/math] is an n×1 column vector for the response variable, [math]\displaystyle{ \mathbf{X} }[/math] is the n×k design matrix of explanatory variables (including a constant), [math]\displaystyle{ \mathbf{e} }[/math] is the n×1 residual vector, and [math]\displaystyle{ \mathbf{b} }[/math] is a k×1 vector of estimates of some population parameter [math]\displaystyle{ \mathbf{\beta} \in \mathbb{R}^{k} }[/math]. Also define [math]\displaystyle{ \mathbf{H} \equiv \mathbf{X} \left(\mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsf{T}} }[/math], the projection matrix of [math]\displaystyle{ \mathbf{X} }[/math]. Then we have the following measures of influence:
- [math]\displaystyle{ \text{DFBETA}_{i} \equiv \mathbf{b} - \mathbf{b}_{(-i)} = \frac{\left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{x}_{i}^{\mathsf{T}} e_{i}}{1 - h_{i}} }[/math], where [math]\displaystyle{ \mathbf{b}_{(-i)} }[/math] denotes the coefficients estimated with the i-th row [math]\displaystyle{ \mathbf{x}_{i} }[/math] of [math]\displaystyle{ \mathbf{X} }[/math] deleted, [math]\displaystyle{ h_{i} = \mathbf{x}_{i} \left( \mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{x}_{i}^{\mathsf{T}} }[/math] denotes the i-th row of [math]\displaystyle{ \mathbf{H} }[/math]. Thus DFBETA measures the difference in each parameter estimate with and without the influential point. There is a DFBETA for each point and each observation (if there are N points and k variables there are N·k DFBETAs).
- DFFITS
- Cook's D measures the effect of removing a data point on all the parameters combined.
- AssessmentVarious methods have been proposed for measuring influence. Assume an estimated regression [math]\displaystyle{ \mathbf{y} = \mathbf{X} \mathbf{b} + \mathbf{e} }[/math], where [math]\displaystyle{ \mathbf{y} }[/math] is an n×1 column vector for the response variable, [math]\displaystyle{ \mathbf{X} }[/math] is the n×k design matrix of explanatory variables (including a constant), [math]\displaystyle{ \mathbf{e} }[/math] is the n×1 residual vector, and [math]\displaystyle{ \mathbf{b} }[/math] is a k×1 vector of estimates of some population parameter [math]\displaystyle{ \mathbf{\beta} \in \mathbb{R}^{k} }[/math]. Also define [math]\displaystyle{ \mathbf{H} \equiv \mathbf{X} \left(\mathbf{X}^{\mathsf{T}} \mathbf{X} \right)^{-1} \mathbf{X}^{\mathsf{T}} }[/math], the projection matrix of [math]\displaystyle{ \mathbf{X} }[/math]. Then we have the following measures of influence: