2007 AnalysisofMorphBasedSpeechRecog

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Subword-Level Language Model; OOV Word; In-Vocabulary Word; OOV Rate; OOV Detection, Morfessor; MAP Optimization Criterion; Grapheme-To-Phoneme Mapping, Morph Language Model, Word Language Model.

Notes

Cited By

Quotes

Abstract

We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four "morphologically rich" languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) rates, whereas the morph LMs can recognize previously unseen word forms by concatenating morphs. We show that the morph LMs generally outperform the word LMs and that they perform fairly well on OOVs without compromising the accuracy obtained for in-vocabulary words.

References

BibTeX

@inproceedings{2007_AnalysisofMorphBasedSpeechRecog,
  author    = {Mathias Creutz and
               Teemu Hirsimaki and
               Mikko Kurimo and
               Antti Puurula and
               Janne Pylkkonen and
               Vesa Siivola and
               Matti Varjokallio and
               Ebru Arisoy and
               Murat Saraclar and
               Andreas Stolcke},
  editor    = {Candace L. Sidner and
               Tanja Schultz and
               Matthew Stone and
               ChengXiang Zhai},
  title     = {Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary
               Words Across Languages},
  booktitle = {Proceedings of the Human Language Technology Conference of the North American Chapter
               of the Association of Computational Linguistics (ACL 2007)},
  pages     = {380--387},
  publisher = {The Association for Computational Linguistics},
  year      = {2007},
  url       = {https://www.aclweb.org/anthology/N07-1048/},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 AnalysisofMorphBasedSpeechRecogAndreas Stolcke
Mathias Creutz
Teemu Hirsimaki
Mikko Kurimo
Antti Puurula
Janne Pylkkonen
Vesa Siivola
Matti Varjokallio
Ebru Arisoy
Murat Saraclar
Analysis of Morph-Based Speech Recognition and the Modeling of Out-of-Vocabulary Words Across Languages2007