2008 StructuredMetricLearningforHigh
- (Davis et al., 2008) ⇒ Jason V. Davis, and Inderjit S. Dhillon. (2008). “Structured Metric Learning for High Dimensional Problems.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1401918
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Structured+metric+learning+for+high+dimensional+problems%22+2008
- http://portal.acm.org/citation.cfm?doid=1401890.1401918&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
The success of popular algorithms such as k-means clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domain-specific notions of similarity for the problem at hand. The distance metric learning problem seeks to optimize a distance function subject to constraints that arise from fully-supervised or semisupervised information. Several recent algorithms have been proposed to learn such distance functions in low dimensional settings. One major shortcoming of these methods is their failure to scale to high dimensional problems that are becoming increasingly ubiquitous in modern data mining applications. In this paper, we present metric learning algorithms that scale linearly with dimensionality, permitting efficient optimization, storage, and evaluation of the learned metric. This is achieved through our main technical contribution which provides a framework based on the log-determinant matrix divergence which enables efficient optimization of structured, low-parameter Mahalanobis distances. Experimentally, we evaluate our methods across a variety of high dimensional domains, including text, statistical software analysis, and collaborative filtering, showing that our methods scale to data sets with tens of thousands or more features. We show that our learned metric can achieve excellent quality with respect to various criteria. For example, in the context of metric learning for nearest neighbor classification, we show that our methods achieve 24% higher accuracy over the baseline distance. Additionally, our methods yield very good precision while providing recall measures up to 20% higher than other baseline methods such as latent semantic analysis.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 StructuredMetricLearningforHigh | Inderjit S. Dhillon Jason V. Davis | Structured Metric Learning for High Dimensional Problems | 10.1145/1401890.1401918 |