2010 WhyLabelWhenYouCanSearchAlterna
- (Attenberg et al., 2010) ⇒ Josh Attenberg, and Foster Provost. (2010). “Why Label When You Can Search?: Alternatives to Active Learning for Applying Human Resources to Build Classification Models under Extreme Class Imbalance.” In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2010). doi:10.1145/1835804.1835859
Subject Headings:
Notes
- Categories and Subject Descriptors: H.2.8 Database Management: Database Applications — data mining; I.2.6 Artificial Intelligence: Learning — induction; I.5.1 Pattern Recognition: Models — statistics.
- General Terms: Design, Performance, Human Factors
Cited By
- http://scholar.google.com/scholar?q=%22Why+label+when+you+can+search%3F%3A+alternatives+to+active+learning+for+applying+human+resources+to+build+classification+models+under+extreme+class+imbalance%22+2010
- http://portal.acm.org/citation.cfm?id=1835859&preflayout=flat#citedby
Quotes
Author Keywords
Active learning, machine learning, class imbalance, human resources, on-line advertising, micro-outsourcing.
Abstract
This paper analyses alternative techniques for deploying low-cost human resources for data acquisition for classifier induction in domains exhibiting extreme class imbalance -- where traditional labeling strategies, such as active learning, can be ineffective. Consider the problem of building classifiers to help brands control the content adjacent to their on-line advertisements. Although frequent enough to worry advertisers, objectionable categories are rare in the distribution of impressions encountered by most on-line advertisers -- so rare that traditional sampling techniques do not find enough positive examples to train effective models. An alternative way to deploy human resources for training-data acquisition is to have them “guide” the learning by searching explicitly for training examples of each class. We show that under extreme skew, even basic techniques for guided learning completely dominate smart (active) strategies for applying human resources to select cases for labeling. Therefore, it is critical to consider the relative cost of search versus labeling, and we demonstrate the tradeoffs for different relative costs. We show that in cost/skew settings where the choice between search and active labeling is equivocal, a hybrid strategy can combine the benefits.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2010 WhyLabelWhenYouCanSearchAlterna | Foster Provost Josh Attenberg | Why Label When You Can Search?: Alternatives to Active Learning for Applying Human Resources to Build Classification Models under Extreme Class Imbalance | KDD-2010 Proceedings | http://pages.stern.nyu.edu/~fprovost/Papers/guidedlearning-kdd2010.pdf | 10.1145/1835804.1835859 | 2010 |