2020 WERWeAreandWERWeThinkWeAre
- (Szymanski et al., 2020) ⇒ Piotr Szymanski, Piotr Zelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Zyla-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski, and Yishay Carmiel. (2020). “WER We Are and WER We Think We Are.” In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP 2020) Online Event.
Subject Headings: Word Error Rate; Automatic Speech Recognition (ASR) System.
Notes
Cited By
Quotes
Abstract
Natural language processing of conversational speech requires the availability of high-quality transcripts. In this paper, we express our skepticism towards the recent reports of very low Word Error Rates (WERs) achieved by modern Automatic Speech Recognition (ASR) systems on benchmark datasets. We outline several problems with popular benchmarks and compare three state-of-the-art commercial ASR systems on an internal dataset of real-life spontaneous human conversations and HUB'05 public benchmark. We show that WERs are significantly higher than the best reported results. We formulate a set of guidelines which may aid in the creation of real-life, multi-domain datasets with high quality annotations for training and testing of robust ASR systems.
References
BibTeX
@inproceedings{2020_WERWeAreandWERWeThinkWeAre, author = {Piotr Szymanski and Piotr Zelasko and Mikolaj Morzy and Adrian Szymczak and Marzena Zyla-Hoppe and Joanna Banaszczak and Lukasz Augustyniak and Jan Mizgajski and Yishay Carmiel}, editor = {Trevor Cohn and Yulan He and [[Yang Liu]]}, title = {WER we are and WER we think we are}, booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings (EMNLP 2020) Online Event}, series = {Findings of ACL}, volume = {EMNLP 2020}, pages = {3290--3295}, publisher = {Association for Computational Linguistics}, year = {2020}, url = {https://doi.org/10.18653/v1/2020.findings-emnlp.295}, doi = {10.18653/v1/2020.findings-emnlp.295}, }
;