2011 HybridLanguageModelsUsingMixedT
- (Ali Basha Shaik et al., 2011) ⇒ M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schluter, and Hermann Ney. (2011). “Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR.” In: Proceeding of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011).
Subject Headings: OOV Word; OOV Word Detection System; Open Vocabulary; Word Error Rate (WER); OOV Rate; Large Vocabulary Continuous Speech Recognition (LVCSR) System; Neural Network Language Model; Morpheme-based Subword Unit; Syllable-based Subword Unit; Graphone-based Sub-Word Unit.
Notes
Cited By
- Google Scholar: ~ 49 Citations.
Quotes
Author Keywords
Abstract
German is a highly inflected language with a large number of words derived from the same root. It makes use of a high degree of word compounding leading to high Out-of-vocabulary (OOV) rates, and Language Model (LM) perplexities. For such languages the use of sub-lexical units for Large Vocabulary Continuous Speech Recognition (LVCSR) becomes a natural choice. In this paper, we investigate the use of mixed types of sub-lexical units in the same recognition lexicon. Namely, morphemic or syllabic units combined with pronunciations called graphones, normal graphemic morphemes or syllables along with full-words. This mixture of units is used for building hybrid LMs suitable for open vocabulary LVCSR where the system operates over an open, constantly changing vocabulary like in broadcast news, political debates, etc. A relative reduction of around 5.0% in Word Error Rate (WER) is obtained compared to a traditional full-words system. Moreover, around 40% of the OOVs are recognized.
References
BibTeX
@inproceedings{2011_HybridLanguageModelsUsingMixedT, author = {M. Ali Basha Shaik and Amr El-Desoky Mousa and Ralf Schluter and Hermann Ney}, title = Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German {LVCSR}}, booktitle = {Proceeding of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011)}, pages = {1441--1444}, publisher = {ISCA}, year = {2011}, url = {http://www.isca-speech.org/archive/interspeech\_2011/i11\_1441.html}, }
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2011 HybridLanguageModelsUsingMixedT | M. Ali Basha Shaik Amr El-Desoky Mousa Ralf Schluter Hermann Ney | Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German {LVCSR} | 2011 |