2007 SimilaritySearchTheMetricSpaceApproach
- (Zezula et al., 2007) ⇒ Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal. (2007). “Similarity Search - The Metric Space Approach.” Tutorial at ACM SAC 2007.
Subject Headings: Similarity Search Task, Similarity Search Algorithm.
Notes
- A tutorial associated with the book: (Zezula et al., 2006)
Cited By
Quotes
Tutorial Overview
The objective of this tutorial is to explain concepts of the powerful metric space approach to similarity searching, demonstrate its capabilities through case studies, and outline its future perspectives through distributed architectures that can achieve constant scalability by exploiting parallelism.
Abstract
Similarity searching has become afundamental computational task in a variety of application areas, including multimedia information retrieval, data mining, pattern recognition, machine learning, computer vision, biomedical databases, data compression and statistical data analysis. In such environments, an exact match has little meaning, and proximity/distance (similarity/dissimilarity) concepts are typically much more fruitful for searching.
The notion of similarity has long been studied extensively in the field of social psychology, nicely characterized by the following quotation:
- “An ability to assess similarity lies close to the core of cognition. The sense of sameness is the very keel and backbone of our thinking. An understanding of problem solving, categorization, memory retrieval,inductive reasoning, and other cognitive processes require that we understand how humans assess similarity.” --Robert L. Goldstone, Similarity, 2001
With the increasing diversity of digital data types covering practically all forms of fact representation, computerized data processing must respect these natural principles and provide adequate tools for similarity searching.
In this tutorial, we review the state of the art in developing similarity search mechanisms that accept the metric space paradigm. We explain the high extensibility of the metric space approach and demonstrate its capability with examples of distance functions. After a survey of specialized partitioning and pruning concepts, we introduce the main indexing representatives and provide performance comparison. The efforts to further speed up retrieval are demonstrated by a class of approximated techniques and the very recent proposals of scalable and distributed structures based on the P2P communication paradigm.
…
References
,