Word Alignment Algorithm
Jump to navigation
Jump to search
A Word Alignment Algorithm is a Sequence Alignment Algorithm that aligns words by finding a similarity measure between them.
- Context:
- It is often based on the Smith-Waterman Algorithm.
- Example(s):
- the one described in Wijffels (2020),
- …
- Counter-Example(s):
- See: Sequence Homology, Longest Common Subsequence, Shortest Common Supersequence, Longest Common Substring, Shortest Common Superstring, Approximate String Matching, Phylogenetic Analysis Algorithm, Alignment-free Sequence Analysis Algorithm, Edit Distance, Alignment Distance, Sequential Pattern Mining Algorithm.
References
2021a
- (Kellis et al., 2021) ⇒ Manolis Kellis et al. (2021). "2.2: Aligning Sequences". In: "Book: Computational Biology - Genomes, Networks, and Evolution (Kellis et al.)", LibreTexts.
- QUOTE: Sequence alignment represents the method of comparing two or more genetic strands, such as DNA or RNA. These comparisons help with the discovery of genetic commonalities and with the (implicit) tracing of strand evolution. There are two main types of alignment:
- Global alignment: an attempt to align every element in a genetic strand, most useful when the genetic strands under consideration are of roughly equal size. Global alignment can also end in gaps.
- Local alignment: an attempt to align regions of sequences that contain similar sequence motifs within a larger context.
- QUOTE: Sequence alignment represents the method of comparing two or more genetic strands, such as DNA or RNA. These comparisons help with the discovery of genetic commonalities and with the (implicit) tracing of strand evolution. There are two main types of alignment:
2021b
- (Kellis et al., 2021) ⇒ Manolis Kellis et al. (2021). "3.2: Introduction". In: "Book: Computational Biology - Genomes, Networks, and Evolution (Kellis et al.)", LibreTexts.
- QUOTE: This chapter will address other forms of alignment algorithms to tackle such scenarios. It will first introduce the Smith-Waterman algorithm for local alignment for aligning subsequences as opposed to complete sequences, in contrast to the Needleman-Wunsch algorithm for global alignment. Later on, an overview will be given of hashing and semi-numerical methods like the Karp-Rabin algorithm for finding the longest (contiguous) common substring of nucleotides. These algorithms are implemented and extended for inexact matching in the BLAST program, one of the most highly cited and successful tools in computational biology. Finally, this chapter will go over BLAST for database searching as well as the probabilistic foundation of sequence alignment and how alignment scores can be interpreted as likelihood ratios.
2020
- (Wijffels, 2020) ⇒ Jan Wijffels (2020). Package ‘text.alignment’: https://cran.r-project.org/web/packages/text.alignment/text.alignment.pdf
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
If the word/letter is the same in each text, the alignment score is increased with the match score, while if they are not the same the local alignment score drops by the gap score. If one of the 2 texts contains extra words letters, the score drops by the mismatch score.
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
2019
- (Mirabello & Wallner, 2019) ⇒ Claudio Mirabello, and Bjorn Wallner (2019). "rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments". In: PLoS ONE 14(8):e0220182. DOI:10.1371/journal.pone.0220182.