Covariance Matrix Approximation Task
(Redirected from approximate the covariance matrix)
Jump to navigation
Jump to search
A Covariance Matrix Approximation Task is a Matrix Approximation Task whose output is a covariance matrix.
- AKA: Covariance Matrix Estimation.
- Context:
- It can be solve by a Covariance Matrix Approximation System (that implements a Covariance Matrix Approximation Algorithm).
- See: Probabilistic Matrix Factorization, Multivariate Random Variable, Joint Probability Distribution, Sample Covariance Matrix, Unbiased Estimator, Positive-Definite Matrix.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/estimation_of_covariance_matrices Retrieved:2015-2-16.
- In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. 'Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers: "Sample covariance matrices are extremely sensitive to outliers". [1] [2]
Statistical analyses of multivariate data often involve exploratory studies of the way in which the variables change in relation to one another and this may be followed up by explicit statistical models involving the covariance matrix of the variables. Thus the estimation of covariance matrices directly from observational data plays two roles:
- to provide initial estimates that can be used to study the inter-relationships;
- to provide sample estimates that can be used for model checking.
- Estimates of covariance matrices are required at the initial stages of principal component analysis and factor analysis, and are also involved in versions of regression analysis that treat the dependent variables in a data-set, jointly with the independent variable as the outcome of a random sample.
- In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. 'Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers: "Sample covariance matrices are extremely sensitive to outliers". [1] [2]
- ↑ Robust Statistics, Peter. J. Huber, Wiley, 1981 (republished in paperback, 2004)
- ↑ "Modern applied statistics with S", William N. Venables, Brian D. Ripley, Springer, 2002, ISBN 0-387-95457-0, ISBN 978-0-387-95457-8, page 336
2014
- http://scikit-learn.org/stable/modules/covariance.html
- Many statistical problems require at some point the estimation of a population’s covariance matrix, which can be seen as an estimation of data set scatter plot shape. Most of the time, such an estimation has to be done on a sample whose properties (size, structure, homogeneity) has a large influence on the estimation’s quality. The sklearn.covariance package aims at providing tools affording an accurate estimation of a population’s covariance matrix under various settings.
We assume that the observations are independent and identically distributed (i.i.d.).
- Many statistical problems require at some point the estimation of a population’s covariance matrix, which can be seen as an estimation of data set scatter plot shape. Most of the time, such an estimation has to be done on a sample whose properties (size, structure, homogeneity) has a large influence on the estimation’s quality. The sklearn.covariance package aims at providing tools affording an accurate estimation of a population’s covariance matrix under various settings.
1998
- (Johnson & Wichern, 1998) ⇒ Richard A. Johnson, and Dean W. Wichern. (1998). “Applied Multivariate Statistical Analysis, 4th ed." Prentice hall, 1992. ISBN:013834194X
- QUOTE: Factor analysis can be considered an extension of principal components analysis. Both can be viewed as attempts to approximate the covariance matrix [math]\displaystyle{ \Sigma }[/math]. however, the approximation based on the factor analysis model is more elaborate. They primary question in factor analysis is whether the data are consistent with a prescribed structure.