2010 UsingFeatureConstructiontoAvoid
- (Mayfield & Penstein-Rosé, 2010) ⇒ Elijah Mayfield, and Carolyn Penstein-Rosé. (2010). “Using Feature Construction to Avoid Large Feature Spaces in Text Classification.” In: Proceedings of the 12th annual conference on Genetic and evolutionary computation. ISBN:978-1-4503-0072-8 doi:10.1145/1830483.1830714
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222010%22+Using+Feature+Construction+to+Avoid+Large+Feature+Spaces+in+Text+Classification
- http://dl.acm.org/citation.cfm?id=1830483.1830714&preflayout=flat#citedby
Quotes
Abstract
Feature space design is a critical part of machine learning. This is an especially difficult challenge in the field of text classification, where an arbitrary number of features of varying complexity can be extracted from documents as a preprocessing step. A challenge for researchers has consistently been to balance expressiveness of features with the size of the corresponding feature space, due to issues with data sparsity that arise as feature spaces grow larger. Drawing on past successes utilizing genetic programming in similar problems outside of text classification, we propose and implement a technique for constructing complex features from simpler features, and adding these more complex features into a combined feature space which can then be utilized by more sophisticated machine learning classifiers. Applying this technique to a sentiment analysis problem, we show encouraging improvement in classification accuracy, with a small and constant increase in feature space size. We also show that the features we generate carry far more predictive power than any of the simple features they contain.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2010 UsingFeatureConstructiontoAvoid | Elijah Mayfield Carolyn Penstein-Rosé | Using Feature Construction to Avoid Large Feature Spaces in Text Classification | 10.1145/1830483.1830714 | 2010 |