Out-Of-Vocabulary Word Error Rate (OOV-WER) Measure
(Redirected from Out-of-vocabulary rate)
Jump to navigation
Jump to search
An Out-Of-Vocabulary Word Error Rate (OOV-WER) Measure is a Word Error Rate Measure that measures the performance of OOV detection system.
- Example(s):
- …
- Counter-Example(s):
- See: OOV Detection System, Recall, Speech Recognition, Machine Translation, Levenshtein Distance, Phoneme.
References
2006
- (Schultz & Kirchhoff, 2006) ⇒ Tanja Schultz, and Katrin Kirchhoff (Eds.). (2006). "Multilingual Speech Processing". Elsevier.
- QUOTE: This is due to their morphologic structure and leads to rapid vocabulary growth and a high rate of Out-of-Vocabulary (OOV) words. OOV words are those that a speech processing system is expected 10 handle but have ever been observed in training data (e.g., unseen inflections of words that are important for the application). Words that do not occur in training data are usually not included in the search vocabulary of the speech recognition system. Thus, they cannot be recognized by the system. Consequently, each OOV word leads 10 at least one recognition error. As a word error can reduce the effective reach of the language model, we usually observe 1.2-1.5 word errors per OOV word.
Table 4.2 gives the size of vocabulary and resulting OOV rates for 10 GlobalPhone languages. The OOV rates differ significantly between these languages (...)
- QUOTE: This is due to their morphologic structure and leads to rapid vocabulary growth and a high rate of Out-of-Vocabulary (OOV) words. OOV words are those that a speech processing system is expected 10 handle but have ever been observed in training data (e.g., unseen inflections of words that are important for the application). Words that do not occur in training data are usually not included in the search vocabulary of the speech recognition system. Thus, they cannot be recognized by the system. Consequently, each OOV word leads 10 at least one recognition error. As a word error can reduce the effective reach of the language model, we usually observe 1.2-1.5 word errors per OOV word.
Language | Vocabulary | OOV Rate |
---|---|---|
Korean | 64K | 34.0% |
Turkish | 64K | 13.5% |
German | 61K | 4.4% |
Portuguese | 60K | 4.3% |
English | 54K | 0.3% |
Korean (segmented) | 64K | 0.2% |
Chinese (sepmented) | 60K | 0% |
Croatian | 31K | 13.6% |
Spanish | 30K | 5.2% |
French | 30K | 4.7% |
Japanese (segmented) | 22K | 3.0% |