Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
MP3
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Background === {{Further|Linear predictive coding|Modified discrete cosine transform}} The MP3 [[lossy compression]] algorithm takes advantage of a perceptual limitation of human hearing called [[auditory masking]]. In 1894, the American physicist [[Alfred M. Mayer]] reported that a tone could be rendered inaudible by another tone of lower frequency.<ref name="Mayer1894" /> In 1959, Richard Ehmer described a complete set of auditory curves regarding this phenomenon.<ref name="Ehmer1959" /> Between 1967 and 1974, [[Eberhard Zwicker]] did work in the areas of tuning and masking of critical frequency-bands,<ref name="Zwicker" /><ref name="Eberhard" /> which in turn built on the fundamental research in the area from [[Harvey Fletcher]] and his collaborators at [[Bell Labs]].<ref name="Fletcher" /> Perceptual coding was first used for [[speech coding]] compression with [[linear predictive coding]] (LPC),<ref name="Schroeder2014">{{cite book |last1= Schroeder |first1= Manfred R. |title= Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date= 2014 |publisher= Springer |isbn= 978-3-319-05660-9 |chapter= Bell Laboratories |page= 388 |chapter-url= https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> which has origins in the work of [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966.<ref>{{cite journal |last1= Gray |first1= Robert M. |title= A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal= Found. Trends Signal Process. |date= 2010 |volume= 3 |issue= 4 |pages= 203β303 |doi= 10.1561/2000000036 |url= https://ee.stanford.edu/~gray/lpcip.pdf |issn= 1932-8346 |doi-access= free |access-date= 14 July 2019 |archive-date= 9 October 2022 |archive-url= https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |url-status= live }}</ref> In 1978, [[Bishnu S. Atal]] and [[Manfred R. Schroeder]] at Bell Labs proposed an LPC speech [[codec]], called [[adaptive predictive coding]], that used a [[psychoacoustic]] coding-algorithm exploiting the masking properties of the human ear.<ref name="Schroeder2014"/><ref>{{cite book |last1= Atal |first1= B. |last2= Schroeder |first2= M. |title= ICASSP '78. IEEE International Conference on Acoustics, Speech, and Signal Processing |chapter= Predictive coding of speech signals and subjective error criteria |date= 1978 |volume= 3 |pages= 573β576 |doi= 10.1109/ICASSP.1978.1170564}}</ref> Further optimization by Schroeder and Atal with J.L. Hall was later reported in a 1979 paper.<ref name="Schroeder1979"/> That same year, a psychoacoustic masking codec was also proposed by M. A. Krasner,<ref name="Krasner" /> who published and produced hardware for speech (not usable as music bit-compression), but the publication of his results in a relatively obscure [[Lincoln Laboratory]] Technical Report<ref>{{cite web|last1= Krasner|first1= M. A.|title= Digital Encoding of Speech Based on the Perceptual Requirement of the Auditory System (Technical Report 535)|url= https://apps.dtic.mil/dtic/tr/fulltext/u2/a077355.pdf|ref= Lincoln Laboratory, MIT|date= 18 June 1979|url-status= live|archive-url= https://web.archive.org/web/20170903070321/https://www.dtic.mil/dtic/tr/fulltext/u2/a077355.pdf|archive-date= 3 September 2017}}</ref> did not immediately influence the mainstream of psychoacoustic codec-development. The [[discrete cosine transform]] (DCT), a type of [[transform coding]] for lossy compression, proposed by [[N. Ahmed|Nasir Ahmed]] in 1972, was developed by Ahmed with T. Natarajan and [[K. R. Rao]] in 1973; they published their results in 1974.<ref>{{cite journal |last= Ahmed |first= Nasir |author-link= N. Ahmed |title= How I Came Up With the Discrete Cosine Transform |journal= [[Digital Signal Processing (journal)|Digital Signal Processing]] |date= January 1991 |volume= 1 |issue= 1 |pages= 4β5 |doi= 10.1016/1051-2004(91)90086-Z |bibcode= 1991DSP.....1....4A |url= https://www.scribd.com/doc/52879771/DCT-History-How-I-Came-Up-with-the-Discrete-Cosine-Transform |access-date= 19 November 2019 |archive-date= 10 June 2016 |archive-url= https://web.archive.org/web/20160610013109/https://www.scribd.com/doc/52879771/DCT-History-How-I-Came-Up-with-the-Discrete-Cosine-Transform |url-status= live }}</ref><ref>{{Citation |first1= Nasir |last1= Ahmed |author1-link= N. Ahmed |first2= T. |last2= Natarajan |first3= K. R. |last3= Rao |title= Discrete Cosine Transform |journal= IEEE Transactions on Computers |date= January 1974 |volume= C-23 |issue= 1 |pages= 90β93 |doi= 10.1109/T-C.1974.223784|s2cid= 149806273 }}</ref><ref>{{Citation |last1= Rao |first1= K. R. |author-link1= K. R. Rao |last2= Yip |first2= P. |title= Discrete Cosine Transform: Algorithms, Advantages, Applications |publisher= Academic Press |location= Boston |year= 1990 |isbn= 978-0-12-580203-1}}</ref> This led to the development of the [[modified discrete cosine transform]] (MDCT), proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,<ref>J. P. Princen, A. W. Johnson und A. B. Bradley: ''Subband/transform coding using filter bank designs based on time domain aliasing cancellation'', IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161β2164, 1987</ref> following earlier work by Princen and Bradley in 1986.<ref>John P. Princen, Alan B. Bradley: ''Analysis/synthesis filter bank design based on time domain aliasing cancellation'', IEEE Trans. Acoust. Speech Signal Processing, ''ASSP-34'' (5), 1153β1161, 1986</ref> The MDCT later became a core part of the MP3 algorithm.<ref name="Guckert">{{cite web |last1= Guckert |first1= John |title= The Use of FFT and MDCT in MP3 Audio Compression |url= http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |website= [[University of Utah]] |date= Spring 2012 |access-date= 14 July 2019 |archive-date= 12 February 2021 |archive-url= https://web.archive.org/web/20210212022237/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |url-status= live }}</ref> Ernst Terhardt and other collaborators constructed an algorithm describing auditory masking with high accuracy in 1982.<ref name="Terhardt1982" /> This work added to a variety of reports from authors dating back to Fletcher, and to the work that initially determined critical ratios and critical bandwidths. In 1985, Atal and Schroeder presented [[code-excited linear prediction]] (CELP), an LPC-based perceptual speech-coding algorithm with auditory masking that achieved a significant [[data compression ratio]] for its time.<ref name="Schroeder2014"/> [[IEEE]]'s refereed ''Journal on Selected Areas in Communications'' reported on a wide variety of (mostly perceptual) audio compression algorithms in 1988.<ref name="Voice Coding for Communications" /> The "Voice Coding for Communications" edition published in February 1988 reported on a wide range of established, working audio bit compression technologies,<ref name="Voice Coding for Communications" /> some of them using auditory masking as part of their fundamental design, and several showing real-time hardware implementations.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)