Longest Common Subsequence Identification Task
Jump to navigation
Jump to search
A Longest Common Subsequence Identification Task is string search task that can find the longest subsequence common to two or more strings.
- AKA: Heaviest Common Subsequence.
- Context:
- output: longest common subsequence (which may not be contiguous, i.e. a longest common substring).
- It can be solved by a Longest Common Subsequence System (that implements a longest common subsequence algorithm).
- …
- Example(s):
LCS([a s b e t c h d]; [r t w a b g j c k t f]) ⇒ [a b c]
.- …
- Counter-Example(s):
- See: Approximate String Matching, File Comparison Task, Dynamic Programming, Memoization.
References
2014
- (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/longest_common_subsequence_problem Retrieved:2014-8-22.
- The longest common subsequence (LCS) problem is to find the longest subsequence common to all sequences in a set of sequences (often just two). (Note that a subsequence is different from a substring, for the terms of the former need not be consecutive terms of the original sequence.) It is a classic computer science problem, the basis of file comparison programs such as diff, and has applications in bioinformatics.
2010
- http://www.itl.nist.gov/div897/sqg/dads/HTML/longestCommonSubsequence.html
- The problem of finding a maximum length (or maximum weight) subsequence of two or more strings.
- Also known as heaviest common subsequence.
- NOTES: The longest common substring is contiguous, while the longest common subsequence need not be.
2009a
- (Rajaraman & Ullman, 2009a) ⇒ Anand Rajaraman, and Jeffrey D. Ullman. (2009). “Applications and Variants of LSH." Stanford CS345A, Winter 2009: Data Mining. 2/2
2009b
- (Rajaraman & Ullman, 2009b) ⇒ Anand Rajaraman, and Jeffrey D. Ullman. (2009). “Distance Measures, Generalizations of Minhashing and LSH." Stanford CS345A, Winter 2009: Data Mining. 2/2-2/4
2003
- (Greenberg, 2003) ⇒ Ronald I. Greenberg (2003). "Bounds on the Number of Longest Common Subsequences. v2.] [arxiv:0301030 .
- QUOTE: A common subsequence of A and B is a subsequence of both A and B. A longest common subsequence (LCS) is a common subsequence of greatest possible length. A pair of sequences may have many different LCSs. In addition, a single LCS may have many different embeddings, i.e., positions in the two strings to which the characters of the LCS correspond.
2002
- (Navarro & Raffinot, 2002) ⇒ Gonzalo Navarro, and Mathieu Raffinot. (2002). “Flexible Pattern Matching in Strings." Cambridge University Press.
- QUOTE: Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal. …
… A concept of “distance” between two strings can be defined according to the minimum cost of making them equal. …
… In some cases it is useful to measure the degree of similarity rather than of dissimilarity (i.e., a distance). One example is the LCS, a heavily studied measure. Other examples are the shortest common supersequence (SCS), longest common substring (LCG, different from LCS because the common string has to be a contiguous substring of both sequences), and shortest common superstring (SCG), as well as their version or more than two strings.
- QUOTE: Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal. …
2000
- (Bergroth et al., 2000) ⇒ L. Bergroth, H. Hakonen, and T. Raita (2000). "A Survey of Longest Common Subsequence Algorithms”. In: IEEE SPIRE 2000 doi:10.1109/SPIRE.2000.878178.
- ABSTRACT: The aim of this paper is to give a comprehensive comparison of well-known longest common subsequence algorithms (for two input strings) and study their behaviour in various application environments. The performance of the methods depends heavily on the properties of the problem instance as well as the supporting data structures used in the implementation. We want to make also a clear distinction between methods that determine the actual lcs and those calculating only its length, since the execution time and more importantly, the space demand depends crucially on the type of the task. To our knowledge, this is the first time this kind of survey has been done. Due to the page limits, the paper gives only a coarse overview of the performance of the algorithms; more detailed studies are reported elsewhere.
1978
- (Maier, 1978) ⇒ David Maier. (1978). “The Complexity of Some Problems on Subsequences and Supersequences.” In: Journal of ACM, 25(2). doi:10.1145/322063.322075.
1975
- (Hirshberg, 1975) ⇒ Dan S. Hirschberg. (1975). “A Linear Space Algorithm for Computing Maximal Common Subsequences." Communications of the ACM, 18(6). doi:10.1145/360825.360861.
1974
- (Wagner & Fischer, 1974) ⇒ Robert A. Wagner, and Michael J. Fischer. (1974). “The String to String Correction Problem.” In: Journal of the ACM (JACM), 21(1).