2009 IssuesinEvaluationofStreamLearn
- (Gama et al., 2009) ⇒ João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. (2009). “Issues in Evaluation of Stream Learning Algorithms.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557060
Subject Headings: Data Stream Analysis Algorithm.
Notes
- Categories and Subject Descriptors: H.2.8 Database Management: Database applications — data mining; I.2.6 Artificial Intelligence: Learning — classifiers design and evaluation.
- General Terms: Experimentation, Measurement, Performance
Cited By
- http://scholar.google.com/scholar?q=%22Issues+in+evaluation+of+stream+learning+algorithms%22+2009
- http://portal.acm.org/citation.cfm?doid=1557019.1557060&preflayout=flat#citedby
Quotes
Author Keywords
Data Streams, Evaluation Design, Concept Drift
Abstract
Learning from data streams is a research area of increasing importance. Nowadays, several stream learning algorithms have been developed. Most of them learn decision models that continuously evolve over time, run in resource-aware environments, detect and react to changes in the environment generating data. One important issue, not yet conveniently addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. There are no golden standards for assessing performance in non-stationary environments. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of Predictive Sequential methods for error estimate -- the prequential error. The prequential error allows us to monitor the evolution of the performance of models that evolve over time. Nevertheless, it is known to be a pessimistic estimator in comparison to holdout estimates. To obtain more reliable estimators we need some forgetting mechanism. Two viable alternatives are : sliding windows and fading factors. We observe that the prequential error converges to a holdout estimator when estimated over a sliding window or using fading factors. A similar observation applies for fading factors. We present illustrative examples of the use of prequential error estimators, using fading factors, for the tasks of : i. assessing performance of a learning algorithm; ii. comparing learning algorithms; iii. hypothesis testing using McNemar test; and iv. change detection using Page-Hinkley test.
In these tasks, the prequential error estimated using fading factors provide reliable estimators. In comparison to sliding windows, fading factors are faster and memory-less, a requirement for streaming applications. This paper is a contribution to a discussion in the good-practices on performance assessment when learning dynamic models that evolve over time.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 IssuesinEvaluationofStreamLearn | Joao Gama Raquel Sebastião Pedro Pereira Rodrigues | Issues in Evaluation of Stream Learning Algorithms | KDD-2009 Proceedings | 10.1145/1557019.1557060 | 2009 |