Levenshtein Edit Distance Measure
Jump to navigation
Jump to search
A Levenshtein Edit Distance Measure is a string edit distance function that is based on the minimum number of string edit operations required to transform one string into another string.
- Context:
- It can be:
- a General Levenshtein Distance Function, that associates different Costs to each separate String Edit Operation.
- a Damerau-Levenshtein Distance Function, that counts Transposition Operation as a single operation.
- It can be used by an Approximate String Matching System that applies an (Approximate String Matching Algorithm.
- It can be:
- Example(s):
- a Levenshtein("David L. Edwards","David L. Emerson")=??
- a Damerau-Levenshtein Distance.
- Counter-Example(s):
- See: Approximate String Matching Task.
References
- improved levenshtein http://www.perlmonks.org/?node_id=333616
- http://search.cpan.org/dist/Text-Levenshtein/
- http://www.merriampark.com/ldperl.htm
2011
- (Wikipedia, 2011) ⇒ http://en.wikipedia.org/wiki/Levenshtein_distance
- In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The term edit distance is often used to refer specifically to Levenshtein distance. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.[1]
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
- kitten → sitten (substitution of 's' for 'k')
- sitten → sittin (substitution of 'i' for 'e')
- sittin → sitting (insertion of 'g' at the end).
- Levenshtein distance is not the only popular notion of edit distance. Variations can be obtained by changing the set of allowable edit operations: for instance,
- length of the longest common subsequence is the metric obtained by allowing only addition and deletion, not substitution;
- the Damerau–Levenshtein distance allows addition, deletion, substitution, and the transposition of two adjacent characters;
- the Hamming distance only allows substitution (and hence, only applies to strings of the same length).
- Edit distance in general is usually defined as a parametrizable metric in which a repertoire of edit operations is available, and each operation is assigned a cost (possibly infinite). This is further generalized by DNA sequence alignment algorithms such as the Smith–Waterman algorithm, which make an operation's cost depend on where it is applied.
- In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (i.e. an edit distance). The term edit distance is often used to refer specifically to Levenshtein distance. The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.[1]
- ↑ В.И. Левенштейн (1965). "Двоичные коды с исправлением выпадений, вставок и замещений символов". Доклады Академий Наук СCCP 163 (4): 845–8. Appeared in English as: Levenshtein VI (1966). "Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10: 707–10. http://www.scribd.com/doc/18654513/levenshtein?secret_password=1aycnw239qw4jqjtsm34#full.
2009
- http://www.algorithmic-solutions.info/leda_manual/distance.html
- An instance D of the data type distance is an object maintaining two strings and providing several string distance functions.
- The Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
- The General Levenshtein distance is given by the minimum sum of the costs over a sequence of operations needed to transform one string into the other, where operation is an insertion, deletion, or substitution and a cost is assigned to each of the operations.
- The Damerau-Levenshtein distance is an extension of Levenshtein distance that counts transposition as a single edit operation. Strictly speaking, the Damerau-Levenshtein distance is equal to the minimal number of insertions, deletions, substitutions and transpositions needed to transform one string into the other.
- An instance D of the data type distance is an object maintaining two strings and providing several string distance functions.
2003
- (Cohen et al., 2003) ⇒ William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. (2003). “A Comparison of String Distance Metrics for Name-Matching Tasks.” In: Workshop on Information Integration on the Web (IIWeb-03).