Similarity Search Task
A similarity search task is a search task within some metric space.
- Context:
- Input:
- a metric space [math]\displaystyle{ M }[/math] (with point set [math]\displaystyle{ P }[/math] and distance function [math]\displaystyle{ d }[/math]).
- a similarity search type.
- optional: a query point [math]\displaystyle{ q \in P }[/math].
- optional: a distance value [math]\displaystyle{ r \in \mathbb{R} }[/math] (such as a radius)
- optional: a positive integer [math]\displaystyle{ k }[/math].
- output:
- a Point Set.
- optional: a Point Set Relation.
- It can (typically) assume a Metric Space.
- Input:
- Example(s):
- a Similarity Search-based Optimization Task, for optimization tasks that require a similarity search algorithm.
- a Similarity Search-based Constraint Satisfaction Task, for constraint satisfaction that require a similarity search algorithm.
- a Range Search Task.
- a 1-Nearest Neighbor Range Search Task.
- a Nearest Neighbor Search Task.
- a Reverse Nearest Neighbor Search Task.
- a Similarity Join-based Search Task.
- a Approximate String Matching, or Approximate Tuple Matching.
- …
- Counter-Example(s):
- any Exact Searching Task.
- See: Keyword Search.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/similarity_search Retrieved:2017-8-30.
- Similarity search is the most general term used for a range of mechanisms which share the principle of searching (typically, very large) spaces of objects where the only available comparator is the similarity between any pair of objects. This is becoming increasingly important in an age of large information repositories where the objects contained do not possess any natural order, for example large collections of images, sounds and other sophisticated digital objects.
Nearest neighbor search and range queries are important subclasses of similarity search, and a number of solutions exist. Research in Similarity Search is dominated by the inherent problems of searching over complex objects. Such objects cause most known techniques to lose traction over large collections, due to a manifestation of the so-called Curse of dimensionality, and there are still many unsolved problems. Unfortunately, in many cases where similarity search is necessary, the objects are inherently complex.
The most general approach to similarity search relies upon the mathematical notion of metric space, which allows the construction of efficient index structures in order to achieve scalability in the search domain.
Similarity search evolved independently in a number different scientific and computing contexts, according to various needs. In 2008 a few leading researchers in the field felt strongly that the subject should be a research topic in its own right, to allow focus on the general issues applicable across the many diverse domains of its use. This resulted in the formation of the SISAP foundation, whose main activity is a series of annual international conferences on the generic topic.
- Similarity search is the most general term used for a range of mechanisms which share the principle of searching (typically, very large) spaces of objects where the only available comparator is the similarity between any pair of objects. This is becoming increasingly important in an age of large information repositories where the objects contained do not possess any natural order, for example large collections of images, sounds and other sophisticated digital objects.
2007
- (Zezula et al., 2007) ⇒ Pavel Zezula, Giuseppe Amato, and Vlastislav Dohnal. (2007). “Similarity Search - The Metric Space Approach." Tutorial at ACM SAC 2007.
2006
- (Zezula et al., 2006) ⇒ Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal, and Michal Batko. (2006). “Similarity Search: The Metric Space Approach. Springer, Advances in Database Systems.
2004
- (Dohnal, 2004) ⇒ Vlastislav Dohnal. (2004). “Indexing Structures for Searching in Metric Spaces." Ph.D. Thesis, Masaryk University in Brno.
2001
- (Navarro, 2001) ⇒ Gonzalo Navarro. (2001). “A Guided Tour to Approximate String Matching.” In: ACM Computing Surveys (CSUR), 33(1). doi:10.1145/375360.375365