Word Alignment Task
Jump to navigation
Jump to search
A Word Alignment Task is a Text Sequence Alignment Task that can determines similarity between two words in a text item.
- Context:
- It can be solved by a Word Alignment System by implementing a Word Alignment Algorithm.
- ...
- Example(s):
- Counter-Example(s):
- See: Needleman-Wunsch Algorithm, Sequence Homology, Longest Common Subsequence, Shortest Common Supersequence, Longest Common Substring, Shortest Common Superstring, Approximate String Matching, Phylogenetic Analysis Task, Alignment-free Sequence Analysis Task, Levenshtein Distance, Edit Distance, Alignment Distance, Sequential Pattern Mining Task.
References
2021
- (Xie, 2021) ⇒ https://www.ics.uci.edu/~xhx/courses/python/tmp/alignment.pdf Retrieved:2021-2-21.
- QUOTE: An alignment of two sequences $S$ and $T$ is obtained by first inserting spaces ('
−
') either into, before or at the ends of $S$ and $T$ to obtain $S'$ and $T'$ such that $|S'|= |T'|$, and then placing $S'$ on top of $T'$ such that every character in $S'$ is uniquely aligned with a character in $T'$.
- QUOTE: An alignment of two sequences $S$ and $T$ is obtained by first inserting spaces ('
2020
- (Wijffels, 2020) ⇒ Jan Wijffels (2020). Package ‘text.alignment’: https://cran.r-project.org/web/packages/text.alignment/text.alignment.pdf
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
If the word/letter is the same in each text, the alignment score is increased with the match score, while if they are not the same the local alignment score drops by the gap score. If one of the 2 texts contains extra words letters, the score drops by the mismatch score.
- QUOTE: Align text using the Smith-Waterman algorithm. The Smith-Waterman algorithm performs local sequence alignment. It finds similar regions between two strings. Similar regions are a sequence of either characters or words which are found by matching the characters or words of 2 sequences of strings.
2015
- (Eger, 2015) ⇒ Steffen Eger (2015, July). "Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment". In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
1999
- (Tiedemann, 1999) ⇒ . Jorg Tiedemann (1999). "Word Alignment Step by Step". In: Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999).
- QUOTE: Word alignment aims at the identification of translation equivalents between linguistic units below the sentence level within parallel text (Merkel 1999[1]), mainly bilingual text ibitext). Those units include single-word units (SlVUs) and multi-word units (MWUs) and will be referred to as link units further on. The basic terminology for describing parallel text and word alignment inthis paper follows Ahrenberg et al (1999)[2] and Ahrenberg et al (forthcoming)[3]. In particular, each word correspondence in the bitext describes a link instance, or simply a link. A pair of link units that is instantiated in the bitext will be referred to as link type. Word alignment systems usually assume segmented bitext (sentence aligned bitext). Common bitext segments are sentence fragments, sentences, and sequences of sentences that have corresponding units in the translation.
- ↑ Merkel, M. 1999. Understanding and enhancing translation by parallel text processing. Linköping Studies in Science and Technology. Dissertation No. 607. Linköping University. Dept, of Computer and Information Science.
- ↑ Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. 1999. Evaluating LWA and UWA. PLUG deliverable 3A.1. Internal report.
- ↑ Ahrenberg, L., Merkel, M., Sagvall Hein, A., and Tiedemann, J. forthcoming. Evaluation of Word Alignment Systems. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC-2000, Athens, Greece, 2000.
1981
- (Smith & Waterman, 1981) ⇒ Temple F. Smith, Michael S. Waterman. (1981). “Identification of Common Molecular Subsequences.” In: Journal of Molecular Biology, 147. doi:10.1016/0022-2836(81)90087-5.
1974
- (Wagner & Fischer, 1974) ⇒ Robert A. Wagner, and Michael J. Fischer. (1974). “The String to String Correction Problem.” In: Journal of the ACM (JACM), 21(1).
- QUOTE: The string-to-string correction problem is to determine the distance between two strings as measured by the minimum cost sequence of “edit operations” needed to change the one string into the other. The edit operations investigated allow changing one symbol of a string into another single symbol, deleting one symbol from a string, or inserting a single symbol into a string. An algorithm is presented which solves this problem in time proportional to the product of the lengths of the two strings. Possible applications are to the problems of automatic spelling correction and determining the longest subsequence of characters common to two strings.