2008 ScalingUpTextClassificationforL
Jump to navigation
Jump to search
- (Forman et al., 2008) ⇒ George Forman, and Shyamsundar Rajaram. (2008). “Scaling Up Text Classification for Large File Systems.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401923
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Scaling+up+text+classification+for+large+file+systems%22+2008
- http://portal.acm.org/citation.cfm?doid=1401890.1401923&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifier that can scale to very large document corpora. We investigate the effect of different methods of formulating the query from the training set, as well as varying the query size. In empirical tests on the Reuters RCV1 corpus of 806,000 documents, we find runtime was easily reduced by a factor of 27x, with a somewhat surprising gain in F-measure compared with traditional text classification.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 ScalingUpTextClassificationforL | George Forman Shyamsundar Rajaram | Scaling Up Text Classification for Large File Systems | 10.1145/1401890.1401923 |