Principal Components Analysis (PCA) Algorithm

Context:
- It can (typically) calculate the eigenvectors of a covariance matrix with the highest eigenvalues, to keep the lower-dimensional representation of the dataset that explains the most variance.
- It can (typically) be a Matrix Dimensionality Compression Algorithm.
- ...
Example(s):
- Karl Pearson's PCA Algorithm, (Pearson, 1901)
- a Covariance-method PCA Algorithm, which derives principal components from the covariance matrix.
- an EM-based PCA Algorithm (EM algorithm for PCA), which uses the expectation-maximization technique to find principal components.
- a SVD-based PCA Algorithm, which utilizes Singular Value Decomposition for PCA.
- ...
Counter-Example(s):
- Singular Value Decomposition Algorithm, which is a more generalized matrix decomposition method.
- Projection Pursuit Algorithm, which seeks the most interesting projections in multidimensional data.
- Multidimensional Scaling Algorithm, focusing on distances or dissimilarities among data points for dimensionality reduction.
- Karhunen-Loeve Transform, used in signal processing.
- t-Distributed Stochastic Neighbor Embedding (t-SNE), a nonlinear dimensionality reduction method.
See: Linear Combination, Covariance Matrix, Linear Model, Eigenvalue Decomposition, PCA Score, Dimensionality Reduction.

References

(Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/Principal_component_analysis Retrieved:2024-3-26.
- Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
  The data is linearly transformed onto a new coordinate system such that the directions (principal components) capturing the largest variation in the data can be easily identified.
  The principal components of a collection of points in a real coordinate space are a sequence of [math]\displaystyle{ p }[/math] unit vectors, where the [math]\displaystyle{ i }[/math] -th vector is the direction of a line that best fits the data while being orthogonal to the first [math]\displaystyle{ i-1 }[/math] vectors. Here, a best-fitting line is defined as one that minimizes the average squared perpendicular distance from the points to the line. These directions (i.e., principal components) constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points.
  Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.