Dataset Dimensionality Reduction Algorithm
A Dataset Dimensionality Reduction Algorithm is a data transformation algorithm that can be implemented into a dimensionality reduction system (that can solve a dimensionality reduction task).
- Context:
- Input:
- optional: the number of required Dimensions.
- It can range from being an Attribute Selection Algorithm (such as a feature set reduction algorithm) to being an Numerical Attribute Space Compression Algorithm) (such as a feature space compression algorithm).
- It can range from being a Supervised Dimensionality Reduction Algorithm to being an Unsupervised Dimensionality Reduction Algorithm.
- It can support a High-Dimensionality Clustering Algorithm.
- …
- Input:
- Example(s):
- Projection Pursuit.
- Random Projections.
- Topological Continuous Map.
- a Linear Dimensionality Reduction Algorithm, such as: Principal Component Analysis, and Singular Value Decomposition.
- a Nonlinear Dimensionality Reduction Algorithm, such as: Multidimensional Scaling, and Locally Linear Embedding.
- Partial Least Squares Regression (PLS)
- Sammon Mapping.
- Multidimensional Scaling (MDS)
- Projection Pursuit.
- …
- Counter-Example(s):
- See: Feature Space.
References
2011
- http://en.wikipedia.org/wiki/Dimensionality_reduction
- In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.
… Feature selection approaches try to find a subset of the original variables (also called features or attributes). Two strategies are filter (e.g. information gain) and wrapper (e.g. search guided by the accuracy) approaches. See also combinatorial optimization problems. In some cases, data analysis such as regression or classification can be done in the reduced space more accurately than in the original space.
… Feature extraction transforms the data in the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist.
- In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction.
2002
- (Fodor, 2002) ⇒ Imola K. Fodor. (2002). “A Survey of Dimension Reduction Techniques." LLNL technical report, UCRL ID-148494
- QUOTE: … We distinguish two major types of dimension reduction methods: linear and non-linear.
… Traditional statistical methods break down partly because of the increase in the number of observations, but mostly because of the increase in the number of variables associated with each observation. The dimension of the data is the number of variables that are measured on each observation.
High-dimensional datasets present many mathematical challenges as well as some opportunities, and are bound to give rise to new theoretical developments [11]. One of the problems with high-dimensional datasets is that, in many cases, not all the measured variables are "important" for understanding the underlying phenomena of interest. While certain computationally expensive novel methods [4] can construct predictive models with high accuracy from high-dimensional data, it is still of interest in many applications to reduce the dimension of the original data prior to any modeling of the data.
- QUOTE: … We distinguish two major types of dimension reduction methods: linear and non-linear.