2009 CoCoCodingCostforParameterFreeO

From GM-RKB

Jump to navigation Jump to search

(Böhm et al., 2009) ⇒ Christian Böhm, Katrin Haegler, Nikola S. Müller, and Claudia Plant. (2009). “CoCo: Coding Cost for Parameter-free Outlier Detection.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557042

Subject Headings:

Notes

Categories and Subject Descriptors: H.2.8 Database applications: Data mining.
General Terms: Algorithms, Design, Reliability

Cited By

Quotes

Author Keywords

Outlier Detection, Coding Costs, Minimum Description Length, Data Compression

Abstract

How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing approaches to outlier detection suffer from one or more of the following drawbacks : The results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data, e.g. the minimum cluster size or the number of desired outliers. Many methods implicitly assume Gaussian or uniformly distributed data, and/or their result is difficult to interpret. To cope with these problems, we propose CoCo, an techniques for parameter-free outlier detection. The basic idea of our techniques relates outlier detection to data compression : Outliers are objects which can not be effectively compressed given the data set. To avoid the assumption of a certain data distribution, CoCo relies on a very general data model combining the Exponential Power Distribution with Independent Components. We define an intuitive outlier factor based on the principle of the Minimum Description Length together with an novel algorithm for outlier detection. An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our techniques. Availability: The source code of CoCo and the data sets used in the experiments are available at : http://www.dbs.ifi.lmu.de/Forschung/KDD/Boehm/CoCo.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2009 CoCoCodingCostforParameterFreeO	Christian Böhm Katrin Haegler Nikola S. Müller Claudia Plant			CoCo: Coding Cost for Parameter-free Outlier Detection		KDD-2009 Proceedings		10.1145/1557019.1557042		2009

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2009_CoCoCodingCostforParameterFreeO&oldid=902506"

Facts

... more about "2009 CoCoCodingCostforParameterFreeO"

Christian Böhm +, Katrin Haegler +, Nikola S. Müller + and Claudia Plant +

10.1145/1557019.1557042 +

Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +

CoCo: Coding Cost for Parameter-free Outlier Detection +

2009 +