Text-Token Character-Pattern Predictor Feature

Example(s):
- GetPattern("G.M.") ⇒ "A-A-"
- GetPattern("Machine-223") = "Aaaaaaa-000"
- GetCompressedPattern("Machine-223") = "Aa-0"
Counter-Example(s):
- a Text Token Length Feature.
- a Text Token hasCapitalLetter Feature, such as [math]\displaystyle{ f }[/math](hasCapital("Markov”)) ⇒ 1
- a Text Token Dictionary Match Feature, such as [math]\displaystyle{ f }[/math](equals("Markov”,"Jordan”)) ⇒ 0
- a Character n-Gram Feature, such as [math]\displaystyle{ f }[/math](“rko”, “Markov”) ⇒ true.
- a Text Token Part-of-Speech Role Feature,
See: Text Token.

References

(Nadeau & Sekine, 2007) ⇒ David Nadeau, and Satoshi Sekine. (2007). “A Survey of Named Entity Recognition and Classification.” In: Lingvisticae Investigationes, 30(1).
- QUOTE: Pattern features were introduced by M. Collins (2002) and then used by others (W. Cohen & Sarawagi 2004 and B. Settles 2004). Their role is to map words onto a small set of patterns over character types. For instance, a pattern feature might map all uppercase letters to “A”, all lowercase letters to “a”, all digits to “0” and all punctuation to “-”: x = "G.M.": GetPattern(x) = "A-A-" x = "Machine-223": GetPattern(x) = "Aaaaaaa-000"
  The summarized pattern feature is a condensed form of the above in which consecutive character types are not repeated in the mapped string. For instance, the preceding examples become: x = "G.M.": GetSummarizedPattern(x) = "A-A-"; x = "Machine-223": GetSummarizedPattern(x) = "Aa-0"