2002 FlexiblePatternMatchingInStrings

From GM-RKB
Jump to navigation Jump to search

Subject Headings: String Pattern Matching Task, String Pattern Matching Algorithm, Multiple String Matching, Extended String Matching, Regular Expression Matching, Approximate String Matching.

Notes

Cited By

Quotes

Abstract

  • String matching problems range from the relatively simple task of searching a single text for a string of characters to searching a database for approximate occurrences of a complex pattern. Recent years have witnessed a dramatic increase of interest in sophisticated string matching problems, especially in information retrieval and computational biology. This book presents a practical approach to string matching problems, focusing on the algorithms and implementations that perform best in practice. It covers searching for simple, multiple and extended strings, as well as regular expressions, and exact and approximate searching. It includes all the most significant new developments in complex pattern searching. The clear explanations, step-by-step examples, algorithm pseudocode, and implementation efficiency maps will enable researchers, professionals and students in bioinformatics, computer science, and software engineering to choose the most appropriate algorithms for their applications.

6 Approximate Matching

6.1 Basic Concepts

  • Approximate string matching, also called "string matching allowing errors," is the problem of finding a pattern [math]\displaystyle{ p }[/math] in a text [math]\displaystyle{ T }[/math] when a limited number [math]\displaystyle{ k }[/math] of differences is permitted between the pattern and its occurrences in the text.
  • We call α=k/m the "error level.” It gives a measure of the "fraction" of the pattern that can be altered.

7 Conclusion

7.4 Related Topics

7.4.6 Sequence Comparison

  • Sequence comparison is about determining similarities and correspondences between two or more strings. It is related to approximate searching (Chapter 6) and has many applications in computational biology, speech recognition, computer science, coding theory, chromatography, and so on. These applications look for similarities between sequences of symbols. The general goal is to perform basic operation over the strings until they become equal.
  • A concept of "distance" between two strings can be defined according to the minimum cost of making them equal.
  • In some cases it is useful to measure the degree of similarity rather than of dissimilarity (i.e., a distance). One example is the LCS, a heavily studeis measure. Other examples are the shortest common supersequence (SCS), longest common substring (LCG, different from LCS because the common string has to be a contiguous substring of both sequences), and shortest common superstring (SCG), as well as their version or more than two strings.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 FlexiblePatternMatchingInStringsMathieu Raffinot
Gonzalo Navarro
Flexible Pattern Matching in Stringshttp://www.cambridge.org/catalogue/catalogue.asp?isbn=05218130772002