Smith-Waterman Sequence Alignment Algorithm
A Smith-Waterman Sequence Alignment Algorithm is a local sequence alignment algorithm that finds the segments in two sequences that have similarities.
- Example(s):
- R package implementation(s) of the Smith-Waterman algorithm:
text.alignment()
(developed by Wijffels, 2020);
- Python implementation(s) of the Smith-Waterman algorithm:
- …
- R package implementation(s) of the Smith-Waterman algorithm:
- Counter-Example(s):
- See: Approximate String Matching Algorithm, Sequence Homology, Longest Common Subsequence, Shortest Common Supersequence, Longest Common Substring, Shortest Common Superstring, Approximate String Matching, Phylogenetic Analysis Task, Alignment-free Sequence Analysis Task, Levenshtein Distance, Edit Distance, Alignment Distance, Sequential Pattern Mining Task.
References
2021
- (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Smith–Waterman_algorithm Retrieved:2021-2-21.
- The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.
The algorithm was first proposed by Temple F. Smith and Michael S. Waterman in 1981.[1] Like the Needleman–Wunsch algorithm, of which it is a variation, Smith–Waterman is a dynamic programming algorithm. As such, it has the desirable property that it is guaranteed to find the optimal local alignment with respect to the scoring system being used (which includes the substitution matrix and the gap-scoring scheme). The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix cells are set to zero, which renders the (thus positively scoring) local alignments visible. Traceback procedure starts at the highest scoring matrix cell and proceeds until a cell with score zero is encountered, yielding the highest scoring local alignment. Because of its quadratic complexity in time and space, it often cannot be practically applied to large-scale problems and is replaced in favor of less general but computationally more efficient alternatives such as (Gotoh, 1982),[2] (Altschul and Erickson, 1986),[3] and (Myers and Miller, 1988).[4]
- The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure.
- ↑ Smith, Temple F. & Waterman, Michael S. (1981). "Identification of Common Molecular Subsequences" (PDF). Journal of Molecular Biology. 147 (1): 195–197. CiteSeerX 10.1.1.63.2897. doi:10.1016/0022-2836(81)90087-5. PMID 7265238.
- ↑ Osamu Gotoh (1982). “An improved algorithm for matching biological sequences". Journal of Molecular Biology. 162 (3): 705–708. CiteSeerX 10.1.1.204.203. doi:10.1016/0022-2836(82)90398-9. PMID 7166760.
- ↑ Osamu Gotoh (1982). “An improved algorithm for matching biological sequences". Journal of Molecular Biology. 162 (3): 705–708. CiteSeerX 10.1.1.204.203. doi:10.1016/0022-2836(82)90398-9. PMID 7166760.
- ↑ Miller, Webb; Myers, Eugene (1988). “Optimal alignments in linear space". Bioinformatics. 4 (1): 11–17. CiteSeerX 10.1.1.107.6989. doi:10.1093/bioinformatics/4.1.11. PMID 3382986.
2020
- (Wijffels, 2020) ⇒ Jan Wijffels (2020). Package ‘text.alignment’: https://cran.r-project.org/web/packages/text.alignment/text.alignment.pdf
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
If the word/letter is the same in each text, the alignment score is increased with the match score, while if they are not the same the local alignment score drops by the gap score. If one of the 2 texts contains extra words letters, the score drops by the mismatch score.
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
1981
- (Smight & Waterman, 1981) ⇒ Temple F. Smith and Michael S. Waterman. (1981). “Identification of Common Molecular Subsequences.” In: Journal of Molecular Biology, 147. doi:10.1016/0022-2836(81)90087-5.