2012 SearchingandMiningTrillionsofTi
- (Rakthanmanon et al., 2012) ⇒ Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, and Eamonn Keogh. (2012). “Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339576
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222012%22+Searching+and+Mining+Trillions+of+Time+Series+Subsequences+under+Dynamic+Time+Warping
- http://dl.acm.org/citation.cfm?id=2339530.2339576&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 SearchingandMiningTrillionsofTi | Qiang Zhu Eamonn Keogh Abdullah Mueen Thanawin Rakthanmanon Bilson Campana Gustavo Batista Brandon Westover Jesin Zakaria | Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping | 10.1145/2339530.2339576 | 2012 |