2012 EfficientEvaluationofLargeSeque
- (Kuksa & Pavlovic, 2012) ⇒ Pavel P. Kuksa, and Vladimir Pavlovic. (2012). “Efficient Evaluation of Large Sequence Kernels.” In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2012). ISBN:978-1-4503-1462-6 doi:10.1145/2339530.2339649
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%222012%22+Efficient+Evaluation+of+Large+Sequence+Kernels
- http://dl.acm.org/citation.cfm?id=2339530.2339649&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Classification of sequences drawn from a finite alphabet using a family of string kernels with inexact matching (e.g., spectrum or mismatch) has shown great success in machine learning. However, selection of optimal mismatch kernels for a particular task is severely limited by inability to compute such kernels for long substrings (k - mers) with potentially many mismatches (m). In this work we introduce a new method that allows us to exactly evaluate kernels for large k, m and arbitrary alphabet size. The task can be accomplished by first solving the more tractable problem for small alphabets, and then trivially generalizing to any alphabet using a small linear system of equations. This makes it possible to explore a larger set of kernels with a wide range of kernel parameters, opening a possibility to better model selection and improved performance of the string kernels. To investigate the utility of large (k, m) string kernels, we consider several sequence classification problems, including protein remote homology detection, fold prediction, and music classification. Our results show that increased k - mer lengths with larger substitutions can improve classification performance.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2012 EfficientEvaluationofLargeSeque | Pavel P. Kuksa Vladimir Pavlovic | Efficient Evaluation of Large Sequence Kernels | 10.1145/2339530.2339649 | 2012 |