Count-Min Sketch Data Structure

A Count-Min Sketch Data Structure is a sketch data structure that is parameterizedby two factors - ε and δ, where the error in answering the query is within a factor of ε with probability δ.

AKA: CM Sketch.
Context:
- It can be used to Summarize Large Amounts of Frequency Data.
- It can an Estimation Procedure that ...
- …
Counter-Example(s):
- Random Sampling-based Summarization DS.
See: Multiset, Blook Filter, Hash Featurization, Randomized Algorithm, Streaming Algorithm, Hash Table, Bloom Filter, Historical Feature.

References

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/count–min_sketch Retrieved:2015-7-20.
- In computing, the count–min sketch (CM sketch) is a probabilistic data structure that serves as a frequency table of events in a stream of data. It uses hash functions to map events to frequencies, but unlike a hash table uses only sub-linear space, at the expense of overcounting some events due to collisions. The count–min sketch was invented in 2003 by Graham Cormode and S. Muthu Muthukrishnan and described by them in a 2005 paper. Count–min sketches are somewhat similar to Bloom filters; the main distinction is that Bloom filters represent sets, while CM sketches represent multisets. Spectral Bloom filters with multi-set policy are conceptually isomorphicto the count–min sketch.

http://debasishg.blogspot.com/2014/01/count-min-sketch-data-structure-for.html
- QUOTE: One of the most popular forms of the sketch data structure is the Count-Min Sketch introduced by Muthukrishnan and Cormode in 2003. The idea is quite simple and the data structure is based on probabilistic algorithms to serve various types of queries on streaming data. The data structure is parameterizedby two factors - ε and δ, where the error in answering the query is within a factor of ε with probability δ. So you can tune these parameters based on the space that you can afford and accordingly amortize the accuracy of results that the data structure can serve you.

(Cormode & Muthukrishnan, 2005) ⇒ Graham Cormode, and S. Muthukrishnan. (2005). “An Improved Data Stream Summary: The Count-min Sketch and Its Applications.” In: Journal of Algorithms 55, no. 1
- QUOTE: We introduce a new sublinear space data structure — the count-min sketch — for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known — typically from 1/ε2 to 1/ε in factor.