2009 FeatureExtractionThroughLocalLearning
Jump to navigation
Jump to search
- (Sun & Wu, 2009) ⇒ Yijun Sun, Dapeng Wu. (2009). “Feature Extraction Through Local Learning.” In: Statistical Analysis and Data Mining, 2(1). doi:10.1002/sam.10028
Subject Headings: Feature Weighting Algorithm, LFE Algorithm, RELIEF Algorithm.
Notes
Quotes
Author Keywords
- feature extraction * local learning * classification * microarray
Abstract
- RELIEF is considered one of the most successful algorithms for assessing the quality of features. It has been recently proved that RELIEF is an online learning algorithm that solves a convex optimization problem with a margin-based objective function. Starting from this mathematical interpretation, we propose a novel feature extraction algorithm, referred to as local feature extraction (LFE), as a natural generalization of RELIEF. LFE collects [[Discriminant Information|discriminant information[[through local learning and can be solved as an eigenvalue decomposition problem with a closed-form solution. A fast implementation of LFE is derived. Compared to principal component analysis, LFE also has a clear physical meaning and can be implemented easily with a comparable computational cost. Compared to other feature extraction algorithms, LFE has an explicit mechanism to remove irrelevant features. Experiments on synthetic and real-world data are presented. The results demonstrate the effectiveness of the proposed algorithm.
1. Introduction
- Feature extraction and selection are two fundamental problems in machine learning research. Not only can their proper design reduce system complexity and processing time, but they can also enhance system performance in many cases.
- Suppose we are given a training dataset D={('xn, y'n)}Nn=1∈X×Y, where [math]\displaystyle{ X }[/math] ∈ R'I is the pattern space, $I$ is the feature dimensionality, and [math]\displaystyle{ Y }[/math] is the label space. A commonly used method for feature extraction and selection is to pre-multiply a projection matrix to pattern vectors with the aim to optimize a certain criterion function.
- L(P) = XN n=1 ℓ(PTxn, yn), (1)
- where ℓ(·) is a criterion function, and P is a projection matrix. By convention, pattern vector xn is denoted as a column. For example, in feature selection, the off-diagonal elements of a projection matrix are all set to zero, and the diagonal elements are restricted to take values from the set {0, 1}. Based on the criterion functions used in search for informative features, feature selection algorithms are traditionally categorized as wrapper and filter methods [1]. In wrapper methods, a classification algorithm is employed to evaluate the goodness of a selected feature subset, whereas in filter methods criterion functions evaluate feature subsets by their information content, typically interclass distance (e.g., Fisher score) or statistical measures (e.g., p-value of t-test), instead of optimizing the performance of any specific learning algorithm directly. Hence, filter methods are computationally much more efficient, but usually do not perform as well as wrapper methods. In addition to a criterion function, a search strategy is also needed. An exhaustive search is optimum but quickly becomes computationally infeasible with the increase of problem size. Therefore, some heuristic combinational searches, such as forward and/or backward selection [2], are usually employed. These algorithms have shown some
- The counterpart to feature selection is feature weighting, wherein the diagonal elements of a projection matrix are allowed to take real-valued numbers instead of from the limited set {0, 1}. This strategy has many advantages. For example, there is no need to pre-specify the number of relevant features. Also, standard optimization techniques (e.g., gradient descent) can be used to avoid combinatorial search. Among the existing feature weighting algorithms, the RELIEF algorithm [5] is considered one of the most successful ones due to its simplicity and effectiveness [6]. RELIEF has been long regarded as a heuristic filter method until recently we have mathematically proved that RELIEF is an online learning algorithm that solves a convex optimization problem aimed at maximizing a margin-based objective function [7]. The margin is defined based on one-nearest-neighbor classifier. Compared with filter methods, RELIEF usually performs better due to the performance feedback of a nonlinear classifier when searching for useful features; compared with conventional wrapper methods, by solving a convex problem, RELIEF avoids any exhaustive or heuristic combinatorial search, and thus can be implemented efficiently. These two merits clearly explain the success of RELIEF in practical applications.
2. Mathematical Interpretation of Relief
3. Learning Distance Metric Matrix
4. Extension to Address Multiclass Problems
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 FeatureExtractionThroughLocalLearning | Yijun Sun Dapeng Wu | Feature Extraction Through Local Learning | Statistical Analysis and Data Mining Journal | http://plaza.ufl.edu/sunyijun/Paper/SDMJournal.pdf | 10.1002/sam.10028 | 2009 |