1999 EfficientProgressiveSampling
- (Provost et al., 1999) ⇒ Foster Provost, David Jensen, and Tim Oates. (1999). “Efficient Progressive Sampling.” In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999). doi:10.1145/312129.312188
Subject Headings: Progressive Sampling Algorithm, Large Training Set Learning Algorithm.
Notes
Cited By
Quotes
Abstract
Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling - using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
1999 EfficientProgressiveSampling | Foster Provost David Jensen Tim Oates | Efficient Progressive Sampling | http://dx.doi.org/10.1145/312129.312188 | 10.1145/312129.312188 |