Automatic Speech Recognition (ASR) System

Example(s)
- a Connectionist Temporal Classification (CTC)-based System,
- a Large Vocabulary Continuous Speech Recognition (LVCSR) System,
- a Latent Sequence Decompositions (LSD) System,
- a Phonetic-based Indexing System.
- one based on OpenAI Whisper Model.
- …
Counter-Example(s):
- a Text Processing System,
- a Text Error Detection System,
- a Word Segmentation System.
See: Audio Mining, Natural Language Processing System, Language Translation System, Word Error Rate.

References

Ian Beaver. (2023) "Is AI at human parity yet? A case study on speech recognition.” In: AAAI Magazine, DOI:10.1002/aaai.12071
- ABSTRACT: Claims have been made that speech recognition has achieved human parity, yet this does not appear to be the case in the real-world applications that rely on it, especially for non-native speakers. This then begs the questions: What does it even mean for an AI system to reach human parity? How is progress towards that goal being measured? This article focuses on the current state of speech recognition and the recent developments in benchmarking and measuring performance of AI models built for speech processing. Through the shift away from single metric benchmarks and specialized models and towards evaluating collections of diverse challenging tasks and generalized models, the ultimate goal of true human parity in commercial speech processing applications is hopefully on the near horizon.