Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Speech recognition
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Automatic conversion of spoken language into text}} {{for|the human linguistic concept|Speech perception}} {{Use dmy dates|date=February 2017}} '''Speech recognition''' is an [[interdisciplinary]] subfield of [[computer science]] and [[computational linguistics]] that develops [[Methodology|methodologies]] and technologies that enable the recognition and [[translation]] of spoken language into text by computers. It is also known as '''automatic speech recognition''' ('''ASR'''), '''computer speech recognition''' or '''speech-to-text''' ('''STT'''). It incorporates knowledge and research in the [[computer science]], [[linguistics]] and [[computer engineering]] fields. The reverse process is [[speech synthesis]]. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated [[vocabulary]] into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent"<ref>{{Cite web |title=Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation |url=http://www.fifthgen.com/speaker-independent-connected-s-r.htm |url-status=live |archive-url=https://web.archive.org/web/20131111101228/http://www.fifthgen.com/speaker-independent-connected-s-r.htm |archive-date=11 November 2013 |access-date=15 June 2013 |publisher=Fifthgen.com |df=dmy-all}}</ref> systems. Systems that use training are called "speaker dependent". Speech recognition applications include [[voice user interface]]s such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make a collect call"), [[domotic]] appliance control, search key words (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics,<ref>{{Cite book |last=P. Nguyen |title=International Conference on Communications and Electronics 2010 |date=2010 |isbn=978-1-4244-7055-6 |pages=147β152 |chapter=Automatic classification of speaker characteristics |doi=10.1109/ICCE.2010.5670700 |s2cid=13482115}}</ref> speech-to-text processing (e.g., [[word processor]]s or [[email]]s), and [[aircraft]] (usually termed [[direct voice input]]). Automatic [[pronunciation assessment]] is used in education such as for spoken language learning. {{anchor|vs_voice_rec}}The term ''voice recognition''<ref name="Macmillan Brit. def of voice recognition">{{Cite web |title=British English definition of voice recognition |url=http://www.macmillandictionary.com/dictionary/british/voice-recognition |url-status=live |archive-url=https://web.archive.org/web/20110916050430/http://www.macmillandictionary.com/dictionary/british/voice-recognition |archive-date=16 September 2011 |access-date=21 February 2012 |publisher=Macmillan Publishers Limited. |df=dmy-all}}</ref><ref name="Voice rec, definition">{{Cite web |title=voice recognition, definition of |url=http://www.businessdictionary.com/definition/voice-recognition.html |url-status=live |archive-url=https://web.archive.org/web/20111203144647/http://www.businessdictionary.com/definition/voice-recognition.html |archive-date=3 December 2011 |access-date=21 February 2012 |publisher=WebFinance, Inc |df=dmy-all}}</ref><ref name="mail bag, gazette">{{Cite web |title=The Mailbag LG #114 |url=http://linuxgazette.net/114/lg_mail.html#mailbag.3 |url-status=live |archive-url=https://web.archive.org/web/20130219032501/http://linuxgazette.net/114/lg_mail.html#mailbag.3 |archive-date=19 February 2013 |access-date=15 June 2013 |publisher=Linuxgazette.net |df=dmy-all}}</ref> or ''[[Speaker recognition|speaker identification]]''<ref>{{Cite journal |last1=Sarangi |first1=Susanta |last2=Sahidullah, Md |last3=Saha, Goutam |date=September 2020 |title=Optimization of data-driven filterbank for automatic speaker verification |journal=Digital Signal Processing |volume=104 |page=102795 |arxiv=2007.10729 |bibcode=2020DSP...10402795S |doi=10.1016/j.dsp.2020.102795 |s2cid=220665533}}</ref><ref>{{Cite journal |last1=Reynolds |first1=Douglas |last2=Rose |first2=Richard |date=January 1995 |title=Robust text-independent speaker identification using Gaussian mixture speaker models |url=http://www.cs.toronto.edu/~frank/csc401/readings/ReynoldsRose.pdf |url-status=live |journal=IEEE Transactions on Speech and Audio Processing |volume=3 |issue=1 |pages=72β83 |doi=10.1109/89.365379 |issn=1063-6676 |oclc=26108901 |s2cid=7319345 |archive-url=https://web.archive.org/web/20140308001101/http://www.cs.toronto.edu/~frank/csc401/readings/ReynoldsRose.pdf |archive-date=8 March 2014 |access-date=21 February 2014 |df=dmy-all}}</ref><ref>{{Cite web |title=Speaker Identification (WhisperID) |url=http://research.microsoft.com/en-us/projects/whisperid/ |url-status=live |archive-url=https://web.archive.org/web/20140225190956/http://research.microsoft.com/en-us/projects/whisperid/ |archive-date=25 February 2014 |access-date=21 February 2014 |website=Microsoft Research |publisher=Microsoft |quote=When you speak to someone, they don't just recognize what you say: they recognize who you are. WhisperID will let computers do that, too, figuring out who you are by the way you sound. |df=dmy-all}}</ref> refers to identifying the speaker, rather than what they are saying. [[Speaker recognition|Recognizing the speaker]] can simplify the task of [[speech translation|translating speech]] in systems that have been trained on a specific person's voice or it can be used to [[Authentication|authenticate]] or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in [[deep learning]] and [[big data]]. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)