Parada-HLTCOE MaxEnt OOV Detection System

From GM-RKB

Jump to navigation Jump to search

A Parada-HLTCOE MaxEnt OOV Detection System is a Maximum Entropy OOV Word Detection System (developed by Parada et al., 2010) that can detect OOV words in the output data of a LVCSR system.

Context:
- It implements a Parada-HLTCOE MaxEnt OOV Detection Algorithm that can solve a Parada-HLTCOE MaxEnt OOV Detection Task.
- It is based on a Filler Model and Confusion Network Based OOV Detection System developed by Rastrow et al. (2009).
- It uses a Grapheme-to-Phoneme Conversion System (Chen, 2003) for obtaining OOV words pronunciation.
- It uses a Path-Based Graph Indexing Vocabulary-Independent Audio Search System (Siohan and Bacchiani,2005) for selecting phone sequences.
Example(s):
- …
Counter-Example(s):
See: Word Embedding System, Text Generation System, Text Translation System, Natural Language Processing System, Confusion Network.

References

2010

(Parada et al., 2010) ⇒ Carolina Parada, Mark Dredze, Denis Filimonov, and Frederick Jelinek. (2010). “Contextual Information Improves OOV Detection in Speech.” In: Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL 2010).
- QUOTE: Our baseline system is the Maximum Entropy model with features from filler and confidence estimation models proposed by Rastrow et al. (2009a). Based on filler models, this approach models OOVs by constructing a hybrid system which combines words and sub-word units. Sub-word units, or fragments, are variable length phone sequences selected using statistical methods (Siohan and Bacchiani, 2005). The vocabulary contains a word and a fragment lexicon; fragments are used to represent OOVs in the language model text. Language model training text is obtained by replacing low frequency words (assumed OOVs) by their fragment representation. Pronunciations for OOVs are obtained using grapheme to phoneme models (Chen, 2003) (...).

**Figure 1:** Example confusion network from the hybrid system with OOV regions and BIO encoding. Hypothesis are ordered by decreasing value of posterior probability. Best hypothesis is the concatenation of the top word/fragments in each bin. We omit posterior probabilities due to spacing.

2009

(Rastrow, 2009) ⇒ Ariya Rastrow, Abhinav Sethy, and Bhuvana Ramabhadran (2009). “A New Method for OOV Detection Using Hybrid Word/Fragment System". In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009).
- QUOTE: Many approaches have been proposed for OOV detection. They can be categorized into two broad groups:
  - 1. Filler Models The first type of methods focuses on explicitly modeling OOVs using either filler or generic word models. (...)
  - 2. Confidence Scores More recent approaches are focused on detecting OOVs based on some confidence measures such as acoustic scores, statistics derived from the language model and statistics derived from N-best lists (or lattices)...

(...)

We have presented a method for OOV detection using sub-word posterior probabilities and demonstrated how it outperforms other commonly used features in the literature. We have also proposed a new method for modeling confusions in the ASR output (in this case confusions from IV terms to fragments) and their subsequent use to significantly improve the performance of the proposed OOV detector.

2005

(Siohan & Bacchiani, 2005) ⇒ Olivier Siohan, and Michiel Bacchiani (2005). "Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing". In: Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech 2005 - Eurospeech).
- QUOTE: Instead, we propose a fast vocabulary independent audio search approach that operates on phonetic lattices and is suitable for any query. However, indexing phonetic lattices so that any arbitrary phone sequence query can be processed efficiently is a challenge, as the choice of the indexing unit is unclear. We propose an inverted index structure on lattices that uses paths as indexing features. The approach is inspired by a general graph indexing method that defines an automatic procedure to select a small number of paths as indexing features, keeping the index size small while allowing fast retrieval of the lattices matching a given query.

2003

(Chen, 2003) ⇒ Stanley F. Chen (2003). "Conditional and Joint Models for Grapheme-to-Phoneme Conversion". In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech 2003- Interspeech 2003).
- QUOTE: In this work, we introduce several models for grapheme-tophoneme conversion: a conditional maximum entropy model, a joint maximum entropy n-gram model, and a joint maximum entropy n-gram model with syllabification. We examine the relative merits of conditional and joint models for this task, and find that joint models have many advantages. We show that the performance of our best model, the joint n-gram model, compares favorably with the best results for English grapheme-to-phoneme conversion reported in the literature, sometimes by a wide margin.

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=Parada-HLTCOE_MaxEnt_OOV_Detection_System&oldid=881153"