Pairwise Sequence Alignment Task
A Pairwise Sequence Alignment Task is a Sequence Alignment Task that aligns two sequences.
- Example(s):
- Counter-Example(s):
- See: Word Sequence Aligment, Needleman-Wunsch Algorithm, Sequence Homology, Longest Common Subsequence, Shortest Common Supersequence, Longest Common Substring, Shortest Common Superstring, Approximate String Matching, Phylogenetic Analysis Task, Alignment-free Sequence Analysis Task, Levenshtein Distance, Edit Distance, Alignment Distance, Sequential Pattern Mining Task.
References
2021b
- (Xie, 2021) ⇒ https://www.ics.uci.edu/~xhx/courses/python/tmp/alignment.pdf Retrieved:2021-2-21.
- QUOTE: An alignment of two sequences $S$ and $T$ is obtained by first inserting spaces ('
−
') either into, before or at the ends of $S$ and $T$ to obtain $S'$ and $T'$ such that $|S'|= |T'|$, and then placing $S'$ on top of $T'$ such that every character in $S'$ is uniquely aligned with a character in $T'$.
- QUOTE: An alignment of two sequences $S$ and $T$ is obtained by first inserting spaces ('
2020
- (Wijffels, 2020) ⇒ Jan Wijffels (2020). Package ‘text.alignment’: https://cran.r-project.org/web/packages/text.alignment/text.alignment.pdf
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
If the word/letter is the same in each text, the alignment score is increased with the match score, while if they are not the same the local alignment score drops by the gap score. If one of the 2 texts contains extra words letters, the score drops by the mismatch score.
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
2019
- (Mirabello & Wallner, 2019) ⇒ Claudio Mirabello, and Bjorn Wallner (2019). "rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments". In: PLoS ONE 14(8):e0220182. DOI:10.1371/journal.pone.0220182.
2015
- (Eger, 2015) ⇒ Steffen Eger (2015, July). "Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment". In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
2011
- (Will, 2011) ⇒ S. Will (2011). "Alignment". In: MIT Course 18.417, Fall 2011.
- QUOTE: Motivation: assess similarity of sequences and learn about their evolutionary relationship. (...)
For pairwise sequence comparison: define edit distance, define alignment distance, show equivalence of distances, define alignment problem and efficient algorithm gap penalties, local alignment.
- QUOTE: Motivation: assess similarity of sequences and learn about their evolutionary relationship.
2002
- (Navarro & Raffinot, 2002) ⇒ Gonzalo Navarro, and Mathieu Raffinot. (2002). “Flexible Pattern Matching in Strings." Cambridge University Press.
- QUOTE: Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal.
... A concept of "distance" between two strings can be defined according to the minimum cost of making them equal.
... In some cases it is useful to measure the degree of similarity rather than of dissimilarity (i.e., a distance). One example is the LCS, a heavily studied measure. Other examples are the shortest common supersequence (SCS), longest common substring (LCG, different from LCS because the common string has to be a contiguous substring of both sequences), and shortest common superstring (SCG), as well as their version or more than two strings.
- QUOTE: Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal.
1999
- (Tiedemann, 1999) ⇒ . Jorg Tiedemann (1999). "Word Alignment Step by Step". In: Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999).
- QUOTE: Word alignment aims at the identification of translation equivalents between linguistic units below the sentence level within parallel text (Merkel 1999[1]), mainly bilingual text ibitext). Those units include single-word units (SlVUs) and multi-word units (MWUs) and will be referred to as link units further on. The basic terminology for describing parallel text and word alignment inthis paper follows Ahrenberg et al (1999)[2] and Ahrenberg et al (forthcoming)[3]. In particular, each word correspondence in the bitext describes a link instance, or simply a link. A pair of link units that is instantiated in the bitext will be referred to as link type. Word alignment systems usually assume segmented bitext (sentence aligned bitext). Common bitext segments are sentence fragments, sentences, and sequences of sentences that have corresponding units in the translation.
- ↑ Merkel, M. 1999. Understanding and enhancing translation by parallel text processing. Linköping Studies in Science and Technology. Dissertation No. 607. Linköping University. Dept, of Computer and Information Science.
- ↑ Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. 1999. Evaluating LWA and UWA. PLUG deliverable 3A.1. Internal report.
- ↑ Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. forthcoming. Evaluation of Word Alignment Systems. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC-2000, Athens, Greece, 2000.
1981
- (Smith & Waterman, 1981) ⇒ Temple F. Smith, Michael S. Waterman. (1981). “Identification of Common Molecular Subsequences.” In: Journal of Molecular Biology, 147. doi:10.1016/0022-2836(81)90087-5.
1974
- (Wagner & Fischer, 1974) ⇒ Robert A. Wagner, and Michael J. Fischer. (1974). “The String to String Correction Problem.” In: Journal of the ACM (JACM), 21(1).
- QUOTE: The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. Possible applications are to the problems of automatic spelling correction and determining the longest subsequence of characters common to two strings.