Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Speech synthesis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == Long before the invention of [[electronics|electronic]] [[signal processing]], some people tried to build machines to emulate human speech.{{citation needed|date=November 2024}} There were also legends of the existence of "[[Brazen Head]]s", such as those involving Pope [[Silvester II]] (d. 1003 AD), [[Albertus Magnus]] (1198β1280), and [[Roger Bacon]] (1214β1294). In 1779, the [[Germany|German]]-[[Denmark|Danish]] scientist [[Christian Gottlieb Kratzenstein]] won the first prize in a competition announced by the Russian [[Russian Academy of Sciences|Imperial Academy of Sciences and Arts]] for models he built of the human [[vocal tract]] that could produce the five long [[vowel]] sounds (in [[International Phonetic Alphabet]] notation: {{IPA|[aΛ]}}, {{IPA|[eΛ]}}, {{IPA|[iΛ]}}, {{IPA|[oΛ]}} and {{IPA|[uΛ]}}).<ref name="Helsinki">[http://www.acoustics.hut.fi/publications/files/theses/lemmetty_mst/chap2.html History and Development of Speech Synthesis], Helsinki University of Technology, Retrieved on November 4, 2006</ref> There followed the [[bellows]]-operated "[[Wolfgang von Kempelen's Speaking Machine|acoustic-mechanical speech machine]]" of [[Wolfgang von Kempelen]] of [[Pressburg]], Hungary, described in a 1791 paper.<ref>''Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden Maschine'' ("Mechanism of the human speech with description of its speaking machine", J. B. Degen, Wien). {{in lang|de}}</ref> This machine added models of the tongue and lips, enabling it to produce consonants as well as vowels. In 1837, [[Charles Wheatstone]] produced a "speaking machine" based on von Kempelen's design, and in 1846, Joseph Faber exhibited the "[[Euphonia (device)|Euphonia]]". In 1923, Paget resurrected Wheatstone's design.<ref>{{Cite journal|last=Mattingly|first=Ignatius G.|year=1974|editor1-last=Sebeok|editor1-first=Thomas A.|title=Speech synthesis for phonetic and phonological models|url=http://www.haskins.yale.edu/Reprints/HL0173.pdf|url-status=dead|journal=Current Trends in Linguistics|location=Mouton, The Hague|volume=12|pages=2451β2487|archive-url=https://web.archive.org/web/20130512085755/http://www.haskins.yale.edu/Reprints/HL0173.pdf|archive-date=2013-05-12|access-date=2011-12-13}}</ref> In the 1930s, [[Bell Labs]] developed the [[vocoder]], which automatically analyzed speech into its fundamental tones and resonances. From his work on the vocoder, [[Homer Dudley]] developed a keyboard-operated voice-synthesizer called [[The Voder]] (Voice Demonstrator), which he exhibited at the [[1939 New York World's Fair]]. [[Franklin S. Cooper|Dr. Franklin S. Cooper]] and his colleagues at [[Haskins Laboratories]] built the [[Pattern playback]] in the late 1940s and completed it in 1950. There were several different versions of this hardware device; only one currently survives. The machine converts pictures of the acoustic patterns of speech in the form of a spectrogram back into sound. Using this device, [[Alvin Liberman]] and colleagues discovered acoustic cues for the perception of [[phonetic]] segments (consonants and vowels). === Electronic devices === [[File:Computer and speech synthesiser housing, 19 (9663804888).jpg|thumb| Computer and speech synthesizer housing used by [[Stephen Hawking]] in 1999]] The first computer-based speech-synthesis systems originated in the late 1950s. Noriko Umeda ''et al.'' developed the first general English text-to-speech system in 1968, at the [[Electrotechnical Laboratory]] in Japan.<ref>{{cite journal | last1 = Klatt | first1 = D | year = 1987 | title = Review of text-to-speech conversion for English | journal = Journal of the Acoustical Society of America | volume = 82 | issue = 3| pages = 737β93 | doi= 10.1121/1.395275| pmid = 2958525 | bibcode = 1987ASAJ...82..737K }}</ref> In 1961, physicist [[John Larry Kelly, Jr]] and his colleague [[Louis Gerstman]]<ref>{{cite news|last=Lambert|first=Bruce|date=March 21, 1992|title=Louis Gerstman, 61, a Specialist In Speech Disorders and Processes|work=The New York Times|url=https://www.nytimes.com/1992/03/21/nyregion/louis-gerstman-61-a-specialist-in-speech-disorders-and-processes.html}}</ref> used an [[IBM 704]] computer to synthesize speech, an event among the most prominent in the history of [[Bell Labs]].{{citation needed|date=April 2016}} Kelly's voice recorder synthesizer ([[vocoder]]) recreated the song "[[Daisy Bell]]", with musical accompaniment from [[Max Mathews]]. Coincidentally, [[Arthur C. Clarke]] was visiting his friend and colleague John Pierce at the Bell Labs Murray Hill facility. Clarke was so impressed by the demonstration that he used it in the climactic scene of his screenplay for his novel ''[[2001: A Space Odyssey (novel)|2001: A Space Odyssey]]'',<ref>{{cite web|url=http://www.lsi.usp.br/~rbianchi/clarke/ACC.Biography.html |title=Arthur C. Clarke Biography |access-date=5 December 2017 |url-status=dead |archive-url=https://web.archive.org/web/19971211154551/http://www.lsi.usp.br/~rbianchi/clarke/ACC.Biography.html |archive-date=December 11, 1997 }}</ref> where the [[HAL 9000]] computer sings the same song as astronaut [[David Bowman (Space Odyssey)|Dave Bowman]] puts it to sleep.<ref>{{cite web|url=http://www.bell-labs.com/news/1997/march/5/2.html |title=Where "HAL" First Spoke (Bell Labs Speech Synthesis website) |publisher=Bell Labs |access-date=2010-02-17 |url-status=dead |archive-url=https://web.archive.org/web/20000407081031/http://www.bell-labs.com/news/1997/march/5/2.html |archive-date=2000-04-07 }}</ref> Despite the success of purely electronic speech synthesis, research into mechanical speech-synthesizers continues.<ref>[http://www.takanishi.mech.waseda.ac.jp/top/research/voice/index.htm Anthropomorphic Talking Robot Waseda-Talker Series] {{webarchive|url=https://web.archive.org/web/20160304034116/http://www.takanishi.mech.waseda.ac.jp/top/research/voice/index.htm |date=2016-03-04 }}</ref>{{Third-party inline|date=July 2019}} [[Linear predictive coding]] (LPC), a form of [[speech coding]], began development with the work of [[Fumitada Itakura]] of [[Nagoya University]] and Shuzo Saito of [[Nippon Telegraph and Telephone]] (NTT) in 1966. Further developments in LPC technology were made by [[Bishnu S. Atal]] and [[Manfred R. Schroeder]] at [[Bell Labs]] during the 1970s.<ref>{{cite journal |last1=Gray |first1=Robert M. |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal=Found. Trends Signal Process. |date=2010 |volume=3 |issue=4 |pages=203β303 |doi=10.1561/2000000036 |url=https://ee.stanford.edu/~gray/lpcip.pdf |archive-url=https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |archive-date=2022-10-09 |url-status=live |issn=1932-8346|doi-access=free }}</ref> LPC was later the basis for early speech synthesizer chips, such as the [[Texas Instruments LPC Speech Chips]] used in the [[Speak & Spell (toy)|Speak & Spell]] toys from 1978. In 1975, Fumitada Itakura developed the [[line spectral pairs]] (LSP) method for high-compression speech coding, while at NTT.<ref>{{cite journal |last1=Zheng |first1=F. |last2=Song |first2=Z. |last3=Li |first3=L. |last4=Yu |first4=W. |title=The Distance Measure for Line Spectrum Pairs Applied to Speech Recognition |journal=Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP'98) |date=1998 |issue=3 |pages=1123β6 |url=http://www.work.caltech.edu/~ling/pub/icslp98lsp.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://www.work.caltech.edu/~ling/pub/icslp98lsp.pdf |archive-date=2022-10-09 |url-status=live}}</ref><ref name="ieee">{{cite web |title=List of IEEE Milestones |url=https://ethw.org/Milestones:List_of_IEEE_Milestones |publisher=[[IEEE]] |access-date=15 July 2019}}</ref><ref name=ItakuraHistory>{{cite web|url=https://ethw.org/Oral-History:Fumitada_Itakura|title=Fumitada Itakura Oral History|publisher=IEEE Global History Network|date=20 May 2009|access-date=2009-07-21}}</ref> From 1975 to 1981, Itakura studied problems in speech analysis and synthesis based on the LSP method.<ref name=ItakuraHistory/> In 1980, his team developed an LSP-based speech synthesizer chip. LSP is an important technology for speech synthesis and coding, and in the 1990s was adopted by almost all international speech coding standards as an essential component, contributing to the enhancement of digital speech communication over mobile channels and the internet.<ref name="ieee"/> In 1975, [[MUSA (MUltichannel Speaking Automaton)|MUSA]] was released, and was one of the first Speech Synthesis systems. It consisted of a stand-alone computer hardware and a specialized software that enabled it to read Italian. A second version, released in 1978, was also able to sing Italian in an "[[a cappella]]" style.<ref>{{cite journal |last1=Billi |first1=Roberto |last2=Canavesio |first2=Franco |last3=Ciaramella |first3=Alberto | author-link3 = Alberto Ciaramella|last4=Nebbia |first4=Luciano |title=Interactive voice technology at work: The CSELT experience |journal=Speech Communication |date=1 November 1995 |volume=17 |issue=3 |pages=263β271 |doi=10.1016/0167-6393(95)00030-R}}</ref> [[File:DECtalk demo.flac|thumb|DECtalk demo recording using the Perfect Paul and Uppity Ursula voices]] Dominant systems in the 1980s and 1990s were the [[DECtalk]] system, based largely on the work of [[Dennis H. Klatt|Dennis Klatt]] at MIT, and the Bell Labs system;<ref>{{Cite book |first1= Richard W. |last1= Sproat |title= Multilingual Text-to-Speech Synthesis: The Bell Labs Approach |publisher= Springer |year= 1997 |isbn= 978-0-7923-8027-6}}</ref> the latter was one of the first multilingual language-independent systems, making extensive use of [[natural language processing]] methods. [[File:DNC(Differentiable Neural Computer).png|600px|frameless|center]] [[File:Fidelity Chess Challenger Voice.jpg|thumb|Fidelity Voice Chess Challenger (1979), the first talking chess computer]] [[File:Fidelity Chess Challenger Voice speech output.flac|thumb|Speech output from Fidelity Voice Chess Challenger]] [[Handheld]] electronics featuring speech synthesis began emerging in the 1970s. One of the first was the [[Telesensory Systems|Telesensory Systems Inc.]] (TSI) ''Speech+'' portable calculator for the blind in 1976.<ref>[TSI Speech+ & other speaking calculators]</ref><ref>Gevaryahu, Jonathan, [ "TSI S14001A Speech Synthesizer LSI Integrated Circuit Guide"]{{dead link|date= December 2011}}</ref> Other devices had primarily educational purposes, such as the [[Speak & Spell (toy)|Speak & Spell toy]] produced by [[Texas Instruments]] in 1978.<ref>Breslow, et al. {{patent|US|4326710|title=Talking electronic game}}: "Talking electronic game", April 27, 1982</ref> Fidelity released a speaking version of its electronic chess computer in 1979.<ref>[http://www.ismenio.com/chess_fidelity_vcc.html Voice Chess Challenger]</ref> The first [[video game]] to feature speech synthesis was the 1980 [[shoot 'em up]] [[arcade game]], ''[[Stratovox]]'' (known in Japan as ''Speak & Rescue''), from [[Sunsoft|Sun Electronics]].<ref>[http://www.gamesradar.com/f/gamings-most-important-evolutions/a-20101008102331322035/p-2 Gaming's most important evolutions] {{webarchive|url=https://web.archive.org/web/20110615221800/http://www.gamesradar.com/f/gamings-most-important-evolutions/a-20101008102331322035/p-2 |date=2011-06-15 }}, [[GamesRadar]]</ref><ref>{{cite magazine |last=Adlum |first=Eddie |title=The Replay Years: Reflections from Eddie Adlum |magazine=RePlay |date=November 1985 |volume=11 |issue=2 |pages=134-175 (160-3) |url=https://archive.org/details/re-play-volume-11-issue-no.-2-november-1985-600DPI/RePlay%20-%20Volume%2011%2C%20Issue%20No.%202%20-%20November%201985/page/162/mode/2up}}</ref> The first [[personal computer game]] with speech synthesis was ''[[Stealth game#History|Manbiki Shoujo]]'' (''Shoplifting Girl''), released in 1980 for the [[PET 2001]], for which the game's developer, Hiroshi Suzuki, developed a "''zero cross''" programming technique to produce a synthesized speech waveform.<ref>{{cite book |last=Szczepaniak |first=John |year=2014 |title=The Untold History of Japanese Game Developers |publisher=SMG Szczepaniak |volume=1 |pages=544β615 |isbn=978-0992926007 }}</ref> Another early example, the arcade version of ''[[Berzerk (video game)|Berzerk]]'', also dates from 1980. The [[Milton Bradley Company]] produced the first multi-player [[electronic game]] using voice synthesis, ''[[Milton (game)|Milton]]'', in the same year. In 1976, Computalker Consultants released their CT-1 Speech Synthesizer. Designed by D. Lloyd Rice and Jim Cooper, it was an analog synthesizer built to work with microcomputers using the S-100 bus standard.<ref>{{Cite news |title=A Short History of Computalker |url=https://amhistory.si.edu/archives/speechsynthesis/ss_rice.htm |website=Smithsonian Speech Synthesis History Project}}</ref> Early electronic speech-synthesizers sounded robotic and were often barely intelligible. The quality of synthesized speech has steadily improved, but {{as of | 2016 | lc = on}} output from contemporary speech synthesis systems remains clearly distinguishable from actual human speech. Synthesized voices typically sounded male until 1990, when [[Ann Syrdal]], at [[AT&T Bell Laboratories]], created a female voice.<ref name=NewYorkTimes>{{cite news|url=https://www.nytimes.com/2020/08/20/technology/ann-syrdal-who-helped-give-computers-a-female-voice-dies-at-74.html|title=Ann Syrdal, Who Helped Give Computers a Female Voice, Dies at 74|work=The New York Times|date=2020-08-20|author=CadeMetz|access-date=2020-08-23}}</ref> Kurzweil predicted in 2005 that as the [[cost-performance ratio]] caused speech synthesizers to become cheaper and more accessible, more people would benefit from the use of text-to-speech programs.<ref>{{cite book |last = Kurzweil |first = Raymond |author-link = Raymond Kurzweil |title = The Singularity is Near |publisher = [[Penguin Books]] |year = 2005 |isbn = 978-0-14-303788-0}} </ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)