Feature Selection Algorithm

A Feature Selection Algorithm is an dimensionality reduction algorithm that can be implemented by a feature selection system (to solve a feature selection task.

AKA: Feature Subset Selection Method.
Context:
- It can be applied by a Feature Selection System.
- It can try to locate Unpredictive Predictor Features.
- It can try to locate Correlated Predictor Features.
- It can range from being an Unsupervised Feature Selection Algorithm to being a Supervised Feature Selection Algorithm.
- …
Example(s):
- an Unsupervised Feature Selection Algorithm.
  - a Laplacian Score Algorithm,
  - a Frequency-based Feature Selection Algorithm.
- a Supervised Feature Selection Algorithm.
  - a RELIEF Algorithm.
- (Melli et al., 2003).
- a Stepwise Regression Algorithm.
- …
Counter-Example(s):
- a Feature Dimensionality Reduction Algorithm.
See: Feature Space.

References

2007

(Zhao & Liu, 2007) ⇒ Zheng Zhao, and Huan Liu. (2007). “Spectral feature selection for supervised and unsupervised learning.” In: Proceedings of the 24th International Conference on Machine learning (ICML 2007).
- Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.

2007

(Zhao & Liu, 2007) ⇒ Zheng Zhao, and Huan Liu. (2007). “Spectral feature selection for supervised and unsupervised learning.” In: Proceedings of the 24th International Conference on Machine learning (ICML 2007).
- Feature selection aims to reduce dimensionality for building comprehensible learning models with good generalization performance. Feature selection algorithms are largely studied separately according to the type of learning: supervised or unsupervised. This work exploits intrinsic properties underlying supervised and unsupervised feature selection algorithms, and proposes a unified framework for feature selection based on spectral graph theory. The proposed framework is able to generate families of algorithms for both supervised and unsupervised feature selection. And we show that existing powerful algorithms such as ReliefF (supervised) and Laplacian Score (unsupervised) are special cases of the proposed framework. To the best of our knowledge, this work is the first attempt to unify supervised and unsupervised feature selection, and enable their joint study under a general framework. Experiments demonstrated the efficacy of the novel algorithms derived from the framework.

2005

(Liu & Yu, 2005) ⇒ Huan Liu, and Lei Yu. “Toward Integrating Feature Selection Algorithms for Classification and Clustering.” In: IEEE Transactions on Knowledge and Data Engineering, 17(4). [doi:10.1109/TKDE.2005.66]

2005

(Peng et al., 2005) ⇒ Hanchuan Peng, Fuhui Long, and Chris Ding. (2005). “Feature Selection based on Mutual Information Criteria of Max-dependency, Max-relevance, and Min-redundancy.” In: Pattern Analysis and Machine Intelligence Journal, 27(8). doi:10.1109/TPAMI.2005.159

2004

(Dy & Brodley, 2004) ⇒ J. G. Dy, and C. E. Brodley. (2004). “Feature Selection for Unsupervised Learning.” In: Journal of Machine Learning Research, 5.

2003

(Guyon & Elisseeff, 2003) ⇒ Isabelle M. Guyon, and André Elisseeff. (2003). “An Introduction to Variable and Feature Selection.” In: The Journal of Machine Learning Research, 3.
- … we summarize the steps that may be taken to solve a feature selection problem in a check list (We caution the reader that this check list is heuristic. The only recommendation that is almost surely valid is to try the simplest things first.)
- 1. Do you have domain knowledge? If yes, construct a better set of “ad hoc” features.
- 2. Are your features commensurate? If no, consider normalizing them.
- 3. Do you suspect interdependence of features? If yes, expand your feature set by constructing conjunctive features or products of features, as much as your computer resources allow you (see example of use in Section 4.4).
- 4. Do you need to prune the input variables (e.g. for cost, speed or data understanding reasons)? If no, construct disjunctive features or weighted sums of features (e.g. by clustering or matrix factorization, see Section 5).
- 5. Do you need to assess features individually (e.g. to understand their influence on the system or because their number is so large that you need to do a first filtering)? If yes, use a variable ranking method (Section 2 and Section 7.2); else, do it anyway to get baseline results.
- 6. Do you need a predictor? If no, stop.
- 7. Do you suspect your data is “dirty” (has a few meaningless input patterns and/or noisy outputs or wrong class labels)? If yes, detect the outlier examples using the top ranking variables obtained in step 5 as representation; check and/or discard them.
- 8. Do you know what to try first? If no, use a linear predictor.3 Use a forward selection method (Section 4.2) with the “probe” method as a stopping criterion (Section 6) or use the `0-norm embedded method (Section 4.3). For comparison, following the ranking of step 5, construct a sequence of predictors of same nature using increasing subsets of features. Can you match or improve performance with a smaller subset? If yes, try a non-linear predictor with that subset.
- 9. Do you have new ideas, time, computational resources, and enough examples? If yes, compare several feature selection methods, including your new idea, correlation coefficients, backward selection and embedded methods (Section 4). Use linear and non-linear predictors. Select the best approach with model selection (Section 6).
- 10. Do you want a stable solution (to improve performance and/or understanding)? If yes, subsample your data and redo your analysis for several “bootstraps” (Section 7.1).

2003

(Melli et al., 2003) ⇒ Gabor Melli, S. Amirrezvani, F. Chen. (2003). “Column Reduction During Progressive Sampling." Workshop on Data Mining for Actionable Knowledge (DMAK-2003).

2002

(Mitra et al., 2002) ⇒ Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. (2002). “Unsupervised Feature Selection Using Feature Similarity.” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3). [doi:10.1109/34.990133]

1997

(Kohavi & John, 1997) ⇒ Ron Kohavi, and George John. “Wrappers for Feature Selection.” In: Artificial Intelligence, 97(1). doi:10.1016/S0004-3702(97)00043-X

1997

(Blum & Langley, 1997) ⇒ Avrim L. Blum, and Pat Langley. (1998). “Selection of Relevant Features and Examples in Machine Learning.” In: Artificial Intelligence, 97(1-2). doi:10.1016/S0004-3702(97)00063-5

1996

(Berger et al., 1996) ⇒ Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. (1996). “A Maximum Entropy Approach to Natural Language Processing.” In: Computational Linguistics, 22(1).

1994

(John et al., 1994) ⇒ George H. John, Ron Kohavi, and Karl Pfleger (1994). “Irrelevant Features and the Subset Selection Problem.” In: Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994).

1977

(Narendra & Fukunaga, 1977) ⇒ P. M. Narendra, and K. Fukunaga. (1977). “A Branch and Bound Algorithm for Feature Subset Selection.” In: IEEE Trans. Comput. 26(9). doi:10.1109/TC.1977.1674939

Feature Selection Algorithm

References

2007

2007

2005

2005

2004

2003

2003

2002

1998

1998

1997

1997

1997

1996

1994

1977

Navigation menu

Search