Automated Speech-to-Text Transcription Task

An Automated Speech-to-Text Transcription Task is a speech-to-text transcription task that is an automated transcription task which requires the conversion of spoken utterances into a machine-processable artifact.

AKA: Automatic Speech Recognition (ASR).
Context:
- It can be solved by an ASR System that applies an (ASR algorithm).
- It can involve Phoneme Recognition, Isolated Word Recognition, and Speaker Adaptation.
- …
Example(s):
- English ASR, Japanese ASR, ...
- a Conversational Speech Recognition Task.
- an Automatic Transcription Task.
- …
Counter-Example(s):
See: Speech Segmentation, Acoustic Model, Computer Speech Recognition, Voice Recognition.

References

http://www.nist.gov/speech

2016

(Xiong et al., 2016) ⇒ Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. (2016). “Achieving Human Parity in Conversational Speech Recognition.” In: arXiv Journal, 1610.05256.
- QUOTE: Conversational speech recognition has served as a flagship speech recognition task since the release of the DARPA Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. …

2013

http://en.wikipedia.org/wiki/Speech_recognition
- In computer science, speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition", "ASR", "computer speech recognition", "speech to text", or just "STT". Some SR systems use "training" where an individual speaker reads sections of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the recognition of that person's speech, resulting in more accurate transcription. Systems that do not use training are called "Speaker Independent" systems. Systems that use training are called "Speaker Dependent" systems.
  Speech recognition applications include voice user interfaces such as voice dialing (e.g. “Call home"), call routing (e.g. “I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
  The term voice recognitionCite error: Invalid <ref> tag; invalid names, e.g. too many^[1]^[2] refers to finding the identity of "who" is speaking, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on specific person's voices or it can be used to authenticate or verify the identity of a speaker as part of a security process.

↑ "voice recognition, definition of". WebFinance, Inc. http://www.businessdictionary.com/definition/voice-recognition.html. Retrieved February 21, 2012.
↑ http://linuxgazette.net/114/lg_mail.html#mailbag.3

2009

Jennifer Lai, Clare-Marie Karat, and Nicole Yankelovich. “Conversational speech interfaces and technologies.” Human-Computer Interaction: Design Issues, Solutions, and Applications (2009): 53.

2005

(Jiang, 2005) ⇒ Hui Jiang. (2005). “Confidence Measures for Speech Recognition: A Survey.” In: Speech Communication, 45(4). doi:10.1016/j.specom.2004.12.004
- QUOTE: In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. A good confidence measure can largely benefit speech recognition systems in many practical applications.

[Voice_rec,_definjition-1] "voice recognition, definition of". WebFinance, Inc. http://www.businessdictionary.com/definition/voice-recognition.html. Retrieved February 21, 2012.

[mail_bag,_gazette-2] ttp://linuxgazette.net/114/lg_mail.html#mailbag.3

[1]

[2]