2007 AnalysisofMorphBasedSpeechRecog
- (Creutz et al., 2007) ⇒ Mathias Creutz, Teemu Hirsimaki, Mikko Kurimo, Antti Puurula, Janne Pylkkonen, Vesa Siivola, Matti Varjokallio, Ebru Arisoy, Murat Saraclar, and Andreas Stolcke. (2007). “Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages.” In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (ACL 2007).
Subject Headings: Subword-Level Language Model; OOV Word; In-Vocabulary Word; OOV Rate; OOV Detection, Morfessor; MAP Optimization Criterion; Grapheme-To-Phoneme Mapping, Morph Language Model, Word Language Model.
Notes
Cited By
- Google Scholar: ~ 36 Citations.
Quotes
Abstract
We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four "morphologically rich" languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) rates, whereas the morph LMs can recognize previously unseen word forms by concatenating morphs. We show that the morph LMs generally outperform the word LMs and that they perform fairly well on OOVs without compromising the accuracy obtained for in-vocabulary words.
References
BibTeX
@inproceedings{2007_AnalysisofMorphBasedSpeechRecog, author = {Mathias Creutz and Teemu Hirsimaki and Mikko Kurimo and Antti Puurula and Janne Pylkkonen and Vesa Siivola and Matti Varjokallio and Ebru Arisoy and Murat Saraclar and Andreas Stolcke}, editor = {Candace L. Sidner and Tanja Schultz and Matthew Stone and ChengXiang Zhai}, title = {Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages}, booktitle = {Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (ACL 2007)}, pages = {380--387}, publisher = {The Association for Computational Linguistics}, year = {2007}, url = {https://www.aclweb.org/anthology/N07-1048/}, }
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2007 AnalysisofMorphBasedSpeechRecog | Andreas Stolcke Mathias Creutz Teemu Hirsimaki Mikko Kurimo Antti Puurula Janne Pylkkonen Vesa Siivola Matti Varjokallio Ebru Arisoy Murat Saraclar | Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages | 2007 |