2015 0BitConsistentWeightedSampling

From GM-RKB

Jump to navigation Jump to search

(Li, 2015) ⇒ Ping Li. (2015). "0-Bit Consistent Weighted Sampling". In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2015). ISBN:978-1-4503-3664-2 doi:10.1145/2783258.2783406

Subject Headings: Consistent Weighted Sampling (CWS), 0-Bit Consistent Weighted Sampling Algorithm.

Notes

Cited By

Google Scholar: ~ 52 Citations, Retrieved:2020-09-18.
ACM DL: ~ 19 Citations, Retrieved:2020-09-18.

Quotes

Author Keywords

Big data; hashing; learning; learning

Abstract

We develop 0-bit consistent weighted sampling (CWS) for efficiently estimating min-max kernel, which is a generalization of the resemblance kernel originally designed for binary data. Because the estimator of 0-bit CWS constitutes a positive definite kernel, this method can be naturally applied to large-scale data mining problems. Basically, if we feed the sampled data from 0-bit CWS to a highly efficient linear classifier (e.g., linear SVM), we effectively (and approximately) train a nonlinear classifier based on the min-max kernel. The accuracy improves as we increase the sample size.

In this paper, we first demonstrate, through an extensive classification study using kernel machines, that the min-max kernel often provides an effective measure of similarity for nonnegative data. This helps justify the use of min-max kernel. However, as the min-max kernel is nonlinear and might be difficult to be used for industrial applications with massive data, we propose to linearize the min-max kernel via 0-bit CWS, a simplification of the original CWS method.

The previous remarkable work on consistent weighted sampling (CWS) produces samples in the form of (i^*, t^*) where the i^* records the location (and in fact also the weights) information analogous to the samples produced by classical minwise hashing on binary data. Because the t^* is theoretically unbounded, it was not immediately clear how to effectively implement CWS for building large-scale linear classifiers. We provide a simple solution by discarding t^* (which we refer to as the "0-bit" scheme). Via an extensive empirical study, we show that this 0-bit scheme does not lose essential information. We then apply 0-bit CWS for building linear classifiers to approximate min-max kernel classifiers, as extensively validated on a wide range of public datasets.

We expect this work will generate interests among data mining practitioners who would like to efficiently utilize the nonlinear information of non-binary and nonnegative data.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2015 0BitConsistentWeightedSampling	Ping Li			0-Bit Consistent Weighted Sampling				10.1145/2783258.2783406		2015

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2015_0BitConsistentWeightedSampling&oldid=850468"

Facts

... more about "2015 0BitConsistentWeightedSampling"

10.1145/2783258.2783406 +

Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +

0-Bit Consistent Weighted Sampling +

2015 +