Text Normalization Task
Jump to navigation
Jump to search
A Text Normalization Task is a text transformation task into a canonical form.
- Example(s):
- ITN("
“October twenty third twenty sixteen
”) ⇒ “October 23, 2016
”. - TN(“
October 23, 2016
”) ⇒ “October twenty third twenty sixteen
".
- ITN("
- See: Inverse Text Normalization, Separation of Concerns, Writing, Canonical Form.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/text_normalization Retrieved:2017-8-29.
- Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure.
2017b
- http://machinelearning.apple.com/2017/08/02/inverse-text-normal.html
- QUOTE: Siri displays entities like dates, times, addresses and currency amounts in a nicely formatted way. This is the result of the application of a process called inverse text normalization (ITN) to the output of a core speech recognition component. To understand the important role ITN plays, consider that, without it, Siri would display “October twenty third twenty sixteen” instead of “October 23, 2016”. In this work, we show that ITN can be formulated as a labelling problem, allowing for the application of a statistical model that is relatively simple, compact, fast to train, and fast to apply. We demonstrate that this approach represents a practical path to a data-driven ITN system.