Sketch Data Structure
A Sketch Data Structure is a summarization data structure that ...
- Example(s):
- Counter-Example(s):
- See: Approximation Query, Linear Transformation-based Sketch.
References
2014
- http://debasishg.blogspot.com/2014/01/count-min-sketch-data-structure-for.html
- QUOTE: One widely used technique for storing a subset of data is through Random Sampling, where the data stored is selected through some stochastic mechanism. There are various ways to determine which data we select for storing and how we build the estimator for querying the data. There are pros and cons with this approach, but it's one of the simplest ways to do approximation based queries on streaming data.
There are a few other options like Histograms and Wavelet based synopses. But one of the most interesting data structures that have been developed in recent times is the Sketch, which uses summary based techniques for delivering approximation queries, gets around the typical problems that sampling techniques have and are highly parallelizable in practice.
An important class of sketch is one where the sketch vector (which is the summary information) is a linear transform of the input vector. So if we model the input as a vector we can multiply it by a sketch matrix to obtain the sketch vector that contains the synopses data that we can use for serving approximation queries. Here's a diagrammatic representation of the sketch as a linear transform of the input data.
- QUOTE: One widely used technique for storing a subset of data is through Random Sampling, where the data stored is selected through some stochastic mechanism. There are various ways to determine which data we select for storing and how we build the estimator for querying the data. There are pros and cons with this approach, but it's one of the simplest ways to do approximation based queries on streaming data.
- https://pypi.python.org/pypi/countminsketch/0.1
- QUOTE: This is a minimalistic `Count-min Sketch` for Python, featuring some cool things like:
- Being able to count anything that is hash-able by python (numbers, strings, tuples, inmutables, duck-typeds, etc.).
- QUOTE: This is a minimalistic `Count-min Sketch` for Python, featuring some cool things like: