String Edit Operation
A String Edit Operation is an edit operation that can convert a string item into another string item.
- Context:
- It can be instantiated in String Structure Edit Operation.
- It can range from being a String Symbol Edit Operation to being ...
- It can range from being a Basic String Edit Operation to being a Complex String Edit Operation.
- Example(s):
- Counter-Example(s):
- See: String Matching Task.
References
2005
- (McCallum, Bellare & Pereira, 2005) ⇒ Andrew McCallum, Kedar Bellare, and Fernando Pereira. (2005). “A conditional random field for discriminatively-trained finite-state string edit distance.” In: Proceedings of the Conference on Uncertainty in AI (UAI 2005).
- QUOTE: Let [math]\displaystyle{ x = x_1 ... x_m }[/math] and [math]\displaystyle{ y = y_1 ... y_n }[/math] be two strings or symbol sequences. This pair of input strings is associated with an output label [math]\displaystyle{ z ∈ {0, 1} }[/math] indicating whether or not the strings should be considered a match (1) or a mismatch (0). [1] As we now explain, our model scores alignments between [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] as to whether they are a match or a mismatch. An alignment a is a four-tuple consisting of a sequence of edit operations, two sequences of string positions, and a sequence of FSM states.
Let [math]\displaystyle{ \bf{a.e} }[/math]<math = e_1 ... e_k</math> indicate the sequence edit operations, such as delete-one-character-in-x, or substituteone-character-in-x-for-one-character-in-y, or also deleteall-characters-in-x-up-to-its-next-nonalphabetic. Each edit operation consumes either some of x (deletion), some of y (insertion), or some of both (substitution).
In addition to the standard edit operations (insertion, deletion, substitution) we have also implemented more powerful edits that come naturally in this model, such as delete-until-end-of-word, delete-word-in-lexicon, and delete-word-appearing-in-other-string.
- QUOTE: Let [math]\displaystyle{ x = x_1 ... x_m }[/math] and [math]\displaystyle{ y = y_1 ... y_n }[/math] be two strings or symbol sequences. This pair of input strings is associated with an output label [math]\displaystyle{ z ∈ {0, 1} }[/math] indicating whether or not the strings should be considered a match (1) or a mismatch (0). [1] As we now explain, our model scores alignments between [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] as to whether they are a match or a mismatch. An alignment a is a four-tuple consisting of a sequence of edit operations, two sequences of string positions, and a sequence of FSM states.
- ↑ One could also straight-forwardly imagine a different regression-based scenario in which [math]\displaystyle{ z }[/math] is real-valued, or also a ranking-based criteria, in which two pairs are provided and [math]\displaystyle{ z }[/math] indicates which pair of strings should be considered closer.