2011 LearningSubWordUnitsforOpenVoca

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Out-Of-Vocabulary (OOV) Word Detection Task; Open Vocabulary Speech Recognition; LVCSR System; Subword Unit; OOV Word.

Notes

Cited By

Quotes

Abstract

Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of sub-word units. Previous work heuristically created the sub-word lexicon from phonetic representations of text using simple statistics to select common phone sequences. We propose a probabilistic model to learn the subword lexicon optimized for a given task. We consider the task of out of vocabulary (OOV) word detection, which relies on output from a hybrid model. A hybrid model with our learned sub-word lexicon reduces error by 6.3% and 7.6% (absolute) at a 5% false alarm rate on an English Broadcast News and MIT Lectures task respectively.

References

BibTeX

@inproceedings{2011_LearningSubWordUnitsforOpenVoca,
  author    = {Carolina Parada and
               Mark Dredze and
               Abhinav Sethy and
               Ariya Rastrow},
  editor    = {Dekang Lin and
               Yuji Matsumoto and
               Rada Mihalcea},
  title     = {Learning Sub-Word Units for Open Vocabulary Speech Recognition},
  booktitle = {Proceeding of the 49th Annual Meeting of the Association for Computational Linguistics:
               Human Language Technologies},
  pages     = {712--721},
  publisher = {The Association for Computer Linguistics},
  year      = {2011},
  url       = {https://www.aclweb.org/anthology/P11-1072/},
}


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2011 LearningSubWordUnitsforOpenVocaMark Dredze
Carolina Parada
Abhinav Sethy
Ariya Rastrow
Learning Sub-Word Units for Open Vocabulary Speech Recognition2011