2009 CoCoCodingCostforParameterFreeO
- (Böhm et al., 2009) ⇒ Christian Böhm, Katrin Haegler, Nikola S. Müller, and Claudia Plant. (2009). “CoCo: Coding Cost for Parameter-free Outlier Detection.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557042
Subject Headings:
Notes
- Categories and Subject Descriptors: H.2.8 Database applications: Data mining.
- General Terms: Algorithms, Design, Reliability
Cited By
- http://scholar.google.com/scholar?q=%22CoCo%3A+coding+cost+for+parameter-free+outlier+detection%22+2009
- http://portal.acm.org/citation.cfm?doid=1557019.1557042&preflayout=flat#citedby
Quotes
Author Keywords
Outlier Detection, Coding Costs, Minimum Description Length, Data Compression
Abstract
How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing approaches to outlier detection suffer from one or more of the following drawbacks : The results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data, e.g. the minimum cluster size or the number of desired outliers. Many methods implicitly assume Gaussian or uniformly distributed data, and/or their result is difficult to interpret. To cope with these problems, we propose CoCo, an techniques for parameter-free outlier detection. The basic idea of our techniques relates outlier detection to data compression : Outliers are objects which can not be effectively compressed given the data set. To avoid the assumption of a certain data distribution, CoCo relies on a very general data model combining the Exponential Power Distribution with Independent Components. We define an intuitive outlier factor based on the principle of the Minimum Description Length together with an novel algorithm for outlier detection. An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our techniques. Availability: The source code of CoCo and the data sets used in the experiments are available at : http://www.dbs.ifi.lmu.de/Forschung/KDD/Boehm/CoCo.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2009 CoCoCodingCostforParameterFreeO | Christian Böhm Katrin Haegler Nikola S. Müller Claudia Plant | CoCo: Coding Cost for Parameter-free Outlier Detection | KDD-2009 Proceedings | 10.1145/1557019.1557042 | 2009 |