2012 MaximumInnerProductSearchUsingC: Difference between revisions

Latest revision as of 00:27, 2 April 2024

(Ram & Gray, 2012) ⇒ Parikshit Ram, and Alexander G. Gray. (2012). “Maximum Inner-product Search Using Cone Trees.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339677

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Algorithm design and analysis; cone trees; data mining; dual-tree branch-and-bound; metric trees; [[search process; trees

Abstract

The problem of efficiently finding the best match for a query in a given set with respect to the Euclidean distance or the cosine similarity has been extensively studied. However, the closely related problem of efficiently finding the best match with respect to the inner-product has never been explored in the general setting to the best of our knowledge. In this paper we consider this problem and contrast it with the previous problems considered. First, we propose a general branch-and-bound algorithm based on a (single) tree data structure. Subsequently, we present a dual-tree algorithm for the case where there are multiple queries. Our proposed branch-and-bound algorithms are based on novel inner-product bounds. Finally we present a new data structure, the cone tree, for increasing the efficiency of the dual-tree algorithm. We evaluate our proposed algorithms on a variety of data sets from various applications, and exhibit up to five orders of magnitude improvement in query time over the naive search technique in some cases.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2012 MaximumInnerProductSearchUsingC	Parikshit Ram Alexander G. Gray			Maximum Inner-product Search Using Cone Trees				10.1145/2339530.2339677		2012

@@ Line 1: / Line 1: @@
-* ([[2012_MaximumInnerProductSearchUsingC|Ram & Gray, 2012]]) &rArr; [[author::Parikshit Ram]], and [[author::Alexander G. Gray]]. ([[year::2012]]). "Maximum Inner-product Search Using Cone Trees." In: [[proceedings::Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining]] ([[conference::KDD-2012]]). ISBN:978-1-4503-1462-6 [http://dx.doi.org/10.1145/2339530.2339677 doi:10.1145/2339530.2339677]
+* ([[2012_MaximumInnerProductSearchUsingC|Ram & Gray, 2012]]) ⇒ [[author::Parikshit Ram]], and [[author::Alexander G. Gray]]. ([[year::2012]]). “Maximum Inner-product Search Using Cone Trees.” In: [[proceedings::Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining]] ([[conference::KDD-2012]]). ISBN:978-1-4503-1462-6 [http://dx.doi.org/10.1145/2339530.2339677 doi:10.1145/2339530.2339677]
 <B>Subject Headings:</B>
-==Notes==
+== Notes ==
-==Cited By==
+== Cited By ==
 * http://scholar.google.com/scholar?q=%222012%22+Maximum+Inner-product+Search+Using+Cone+Trees
 * http://dl.acm.org/citation.cfm?id=2339530.2339677&preflayout=flat#citedby
-==Quotes==
+== Quotes ==
-===Author Keywords===
-* [[Algorithm design and analysi]]s; [[cone tree]]s; [[data mining]]; [[dual-tree branch-and-bound]]; [[metric tree]]s; [[search proces]]s; [[tree]]s
-===Abstract===
+=== Author Keywords ===
+* [[Algorithm design and analysis]]; [[cone tree]]s; [[data mining]]; [[dual-tree branch-and-bound]]; [[metric tree]]s; [[search process; [[tree]]s
-The [[problem]] of [[efficiently]] [[finding the best match]] for a [[query]] in a given [[set]] with respect to the [[Euclidean distance]] or the [[cosine similarity]] has been extensively [[studied]]. </s>
+=== Abstract ===
-However, the closely related [[problem]] of [[efficiently]] [[finding the best match]] with respect to the [[inner-product]] has never been explored in the [[general setting]] to the best of [[our knowledge]]. </s>
-In [[this paper]] we consider [[this problem]] and contrast it with the [[previous problem]]s considered. </s>
+The [[problem of efficiently]] [[finding the best match]] for a [[query]] in a given [[set]] with respect to the [[Euclidean distance]] or the [[cosine]] [[similarity]] has been extensively [[studied]]. </s>
-First, we propose a general [[branch-and-bound algorithm]] based on a (single) [[tree data structure]]. </s>
+However, the closely related [[problem of efficiently]] [[finding the best match]] with respect to the [[inner-product]] has never been explored in the [[general setting]] to the best of our knowledge. </s>
-Subsequently, we present a [[dual-tree algorithm]] for the case where there are [[multiple queri]]es. </s>
+[[In this paper we]] consider [[this problem]] and contrast it with the [[previous problem]]s considered. </s>
+First, [[we]] propose a general [[branch-and-bound algorithm]] based on a (single) [[tree data structure]]. </s>
+Subsequently, [[we]] present a [[dual-tree algorithm]] for the case where there are [[multiple queri]]es. </s>
 [[Our proposed branch-and-bound algorithm]]s are based on novel [[inner-product bound]]s. </s>
-Finally we present a new [[data structure]], the [[cone tree]], for increasing the [[efficiency]] of the [[dual-tree algorithm]]. </s>
+Finally [[we]] present a new [[data structure]], the [[cone tree]], for increasing the [[efficiency]] of the [[dual-tree algorithm]]. </s>
 [[We]] evaluate [[our proposed algorithm]]s on a variety of [[data set]]s from various [[application]]s, and exhibit up to five orders of [[magnitude improvement]] in [[query time]] over the [[naive search technique]] in some [[case]]s. </s>
-==References==
+== References ==
 {{#ifanon:|
-* 1. Roberto J. Bayardo, Yiming Ma, Ramakrishnan Srikant, Scaling Up all Pairs Similarity Search, Proceedings of the 16th International Conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada [http://doi.acm.org/10.1145/1242572.1242591 doi:10.1145/1242572.1242591]
+* 1. [[Roberto J. Bayardo]], Yiming Ma, Ramakrishnan Srikant, Scaling Up all Pairs Similarity Search, Proceedings of the 16th International Conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada [http://doi.acm.org/10.1145/1242572.1242591 doi:10.1145/1242572.1242591]
 * 2. Robert M. Bell, Yehuda Koren, Lessons from the Netflix Prize Challenge, ACM SIGKDD Explorations Newsletter, v.9 n.2, December 2007 [http://doi.acm.org/10.1145/1345448.1345465 doi:10.1145/1345448.1345465]
 * 3. J. Bennett and S. Lanning. The Netflix Prize. In Proc. KDD Cup and Workshop, 2007.
-* 4. Alina Beygelzimer, Sham Kakade, John Langford, Cover Trees for Nearest Neighbor, Proceedings of the 23rd International Conference on Machine Learning, p.97-104, June 25-29, 2006, Pittsburgh, Pennsylvania [http://doi.acm.org/10.1145/1143844.1143857 doi:10.1145/1143844.1143857]
+* 4. Alina Beygelzimer, Sham Kakade, [[John Langford]], Cover Trees for Nearest Neighbor, Proceedings of the 23rd International Conference on Machine Learning, p.97-104, June 25-29, 2006, Pittsburgh, Pennsylvania [http://doi.acm.org/10.1145/1143844.1143857 doi:10.1145/1143844.1143857]
 * 5. C. L. Blake and C. J. Merz. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/, 1998.
 * 6. L. Cayton and S. Dasgupta. A Learning Framework for Nearest Neighbor Search. Advances in Neural Info. Proc. Systems 20, 2007.
@@ Line 39: / Line 40: @@
 * 12. G. Dror, N. Koenigstein, Y. Koren, and M. Weimer. The Yahoo! Music Dataset and KDD-Cup'11. Journal Of Machine Learning Research, 2011.
 * 13. Jerome H. Friedman, Jon Louis Bentley, Raphael Ari Finkel, An Algorithm for Finding Best Matches in Logarithmic Expected Time, ACM Transactions on Mathematical Software (TOMS), v.3 n.3, p.209-226, Sept. 1977 [http://doi.acm.org/10.1145/355744.355745 doi:10.1145/355744.355745]
-* 14. Aristides Gionis, Piotr Indyk, Rajeev Motwani, Similarity Search in High Dimensions via Hashing, Proceedings of the 25th International Conference on Very Large Data Bases, p.518-529, September 07-10, 1999
+* 14. [[Aristides Gionis]], Piotr Indyk, Rajeev Motwani, Similarity Search in High Dimensions via Hashing, Proceedings of the 25th International Conference on Very Large Data Bases, p.518-529, September 07-10, 1999
 * 15. A. G. Gray and A. W. Moore. 'N-Body' Problems in Statistical Learning. In Advances in Neural Info. Proc. Systems 13, 2000.
 * 16. A. G. Gray and A. W. Moore. Nonparametric Density Estimation: Toward Computational Tractability. In SIAM Data Mining, 2003.
@@ Line 49: / Line 50: @@
 * 22. Yehuda Koren, Robert Bell, Chris Volinsky, Matrix Factorization Techniques for Recommender Systems, Computer, v.42 n.8, p.30-37, August 2009 [http://dx.doi.org/10.1109/MC.2009.263 doi:10.1109/MC.2009.263]
 * 23. B. Kulis and K. Grauman. Kernelized Locality-sensitive Hashing for Scalable Image Search. In IEEE 12th Intl. Conf. on Computer Vision, 2009.
-* 24. Y. LeCun.textscMNist Dataset, 2000. http://yann.lecun.com/exdb/mnist/.
+* 24. [[Y. LeCun]].textscMNist Dataset, 2000. http://yann.lecun.com/exdb/mnist/.
-* 25. Z. Li, H. Ning, L. Cao, T. Zhang, Y. Gong, and T. S. Huang. Learning to Search Efficiently in High Dimensions. In Advances in Neural Info. Proc. Systems 24. 2011.
+* 25. Z. Li, H. Ning, L. Cao, [[T. Zhang]], Y. Gong, and T. S. Huang. Learning to Search Efficiently in High Dimensions. In Advances in Neural Info. Proc. Systems 24. 2011.
 * 26. T. Liu, A. W. Moore, A. G. Gray, and K. Yang. An Investigation of Practical Approximate Nearest Neighbor Algorithms. In Advances in Neural Info. Proc. Systems 17, 2005.
 * 27. R. Lupton, J. Gunn, Z. Ivezic, G. Knapp, S. Kent, and N. Yasuda. The SDSS Imaging Pipelines. Arxiv Preprint Astro-ph/0101420, 2001.