Automatic Speech Recognition (ASR) System
Jump to navigation
Jump to search
An Automatic Speech Recognition (ASR) System is a Speech Recognition System that implements an ASR algorithm to solve an ASR task.
- Example(s)
- Counter-Example(s):
- See: Audio Mining, Natural Language Processing System, Language Translation System, Word Error Rate.
References
2023
- Ian Beaver. (2023) "Is AI at human parity yet? A case study on speech recognition.” In: AAAI Magazine, DOI:10.1002/aaai.12071
- ABSTRACT: Claims have been made that speech recognition has achieved human parity, yet this does not appear to be the case in the real-world applications that rely on it, especially for non-native speakers. This then begs the questions: What does it even mean for an AI system to reach human parity? How is progress towards that goal being measured? This article focuses on the current state of speech recognition and the recent developments in benchmarking and measuring performance of AI models built for speech processing. Through the shift away from single metric benchmarks and specialized models and towards evaluating collections of diverse challenging tasks and generalized models, the ultimate goal of true human parity in commercial speech processing applications is hopefully on the near horizon.
2020
- (Szymanski et al., 2020) ⇒ Piotr Szymanski, Piotr Zelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Zyla-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, and Yishay Carmiel. (2020). “WER We Are and WER We Think We Are.” In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP 2020) Online Event.
- ABSTRACT: Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.