Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Speech recognition
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Pre-1970=== * '''1952''' β Three Bell Labs researchers, Stephen Balashek,<ref>{{Cite news |date=22 July 2012 |title=Obituaries: Stephen Balashek |url=https://obits.nj.com/obituaries/starledger/obituary.aspx?page=lifestory&pid=158702138 |work=The Star-Ledger |access-date=9 September 2024 |archive-date=4 April 2019 |archive-url=https://web.archive.org/web/20190404231352/https://obits.nj.com/obituaries/starledger/obituary.aspx?page=lifestory&pid=158702138 |url-status=live }}</ref> R. Biddulph, and K. H. Davis built a system called "Audrey"<ref>{{Cite web |title=IBM-Shoebox-front.jpg |url=https://cdn57.androidauthority.net/wp-content/uploads/2012/04/IBM-Shoebox-front.jpg |access-date=4 April 2019 |publisher=androidauthority.net |archive-date=9 August 2018 |archive-url=https://web.archive.org/web/20180809153221/https://cdn57.androidauthority.net/wp-content/uploads/2012/04/IBM-Shoebox-front.jpg |url-status=live }}</ref> for single-speaker digit recognition. Their system located the [[formants]] in the power spectrum of each utterance.<ref>{{Cite web |last1=Juang |first1=B. H. |last2=Rabiner |first2=Lawrence R. |title=Automatic speech recognitionβa brief history of the technology development |url=http://www.ece.ucsb.edu/faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf |url-status=live |archive-url=https://web.archive.org/web/20140817193243/http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf |archive-date=17 August 2014 |access-date=17 January 2015 |page=6}}</ref> * '''1960''' β [[Gunnar Fant]] developed and published the [[source-filter model of speech production]]. * '''1962''' β [[IBM]] demonstrated its 16-word "Shoebox" machine's speech recognition capability at the [[1962 World's Fair]].<ref name="PCW.Siri">{{Cite magazine |last=Melanie Pinola |date=2 November 2011 |title=Speech Recognition Through the Decades: How We Ended Up With Siri |url=https://www.pcworld.com/article/243060/speech_recognition_through_the_decades_how_we_ended_up_with_siri.html |access-date=22 October 2018 |magazine=PC World |archive-date=3 November 2018 |archive-url=https://web.archive.org/web/20181103105727/https://www.pcworld.com/article/243060/speech_recognition_through_the_decades_how_we_ended_up_with_siri.html |url-status=live }}</ref> * '''1966''' β [[Linear predictive coding]] (LPC), a [[speech coding]] method, was first proposed by [[Fumitada Itakura]] of [[Nagoya University]] and Shuzo Saito of [[Nippon Telegraph and Telephone]] (NTT), while working on speech recognition.<ref name="Gray">{{Cite journal |last=Gray |first=Robert M. |date=2010 |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |url=https://ee.stanford.edu/~gray/lpcip.pdf |journal=Found. Trends Signal Process. |volume=3 |issue=4 |pages=203β303 |doi=10.1561/2000000036 |issn=1932-8346 |doi-access=free |access-date=9 September 2024 |archive-date=9 October 2022 |archive-url=https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |url-status=live }}</ref> * '''1969''' β Funding at [[Bell Labs]] dried up for several years when, in 1969, the influential [[John R. Pierce|John Pierce]] wrote an open letter that was critical of and defunded speech recognition research.<ref name="jasapierce">{{Cite journal |last=John R. Pierce |author-link=John R. Pierce |date=1969 |title=Whither speech recognition? |journal=Journal of the Acoustical Society of America |volume=46 |issue=48 |pages=1049β1051 |bibcode=1969ASAJ...46.1049P |doi=10.1121/1.1911801}}</ref> This defunding lasted until Pierce retired and [[James L. Flanagan]] took over. [[Raj Reddy]] was the first person to take on continuous speech recognition as a graduate student at [[Stanford University]] in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing [[chess]]. Around this time Soviet researchers invented the [[dynamic time warping]] (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary.<ref>{{Cite book |last1=Benesty |first1=Jacob |title=Springer Handbook of Speech Processing |last2=Sondhi |first2=M. M. |last3=Huang |first3=Yiteng |date=2008 |publisher=Springer Science & Business Media |isbn=978-3540491255}}</ref> DTW processed speech by dividing it into short frames, e.g. 10ms segments, and processing each frame as a single unit. Although DTW would be superseded by later algorithms, the technique carried on. Achieving speaker independence remained unsolved at this time period.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)