Editing MP3 (section)

=== Background ===
{{Further|Linear predictive coding|Modified discrete cosine transform}}

The MP3 [[lossy compression]] algorithm takes advantage of a perceptual limitation of human hearing called [[auditory masking]]. In 1894, the American physicist [[Alfred M. Mayer]] reported that a tone could be rendered inaudible by another tone of lower frequency.<ref name="Mayer1894" /> In 1959, Richard Ehmer described a complete set of auditory curves regarding this phenomenon.<ref name="Ehmer1959" /> Between 1967 and 1974, [[Eberhard Zwicker]] did work in the areas of tuning and masking of critical frequency-bands,<ref name="Zwicker" /><ref name="Eberhard" /> which in turn built on the fundamental research in the area from [[Harvey Fletcher]] and his collaborators at [[Bell Labs]].<ref name="Fletcher" />

Perceptual coding was first used for [[speech coding]] compression with [[linear predictive coding]] (LPC),<ref name="Schroeder2014">{{cite book |last1= Schroeder |first1= Manfred R. |title= Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date= 2014 |publisher= Springer |isbn= 978-3-319-05660-9 |chapter= Bell Laboratories |page= 388 |chapter-url= https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> which has origins in the work of [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966.<ref>{{cite journal |last1= Gray |first1= Robert M. |title= A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal= Found. Trends Signal Process. |date= 2010 |volume= 3 |issue= 4 |pages= 203–303 |doi= 10.1561/2000000036 |url= https://ee.stanford.edu/~gray/lpcip.pdf |issn= 1932-8346 |doi-access= free |access-date= 14 July 2019 |archive-date= 9 October 2022 |archive-url= https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf |url-status= live }}</ref> In 1978, [[Bishnu S. Atal]] and [[Manfred R. Schroeder]] at Bell Labs proposed an LPC speech [[codec]], called [[adaptive predictive coding]], that used a [[psychoacoustic]] coding-algorithm exploiting the masking properties of the human ear.<ref name="Schroeder2014"/><ref>{{cite book |last1= Atal |first1= B. |last2= Schroeder |first2= M. |title= ICASSP '78. IEEE International Conference on Acoustics, Speech, and Signal Processing |chapter= Predictive coding of speech signals and subjective error criteria |date= 1978 |volume= 3 |pages= 573–576 |doi= 10.1109/ICASSP.1978.1170564}}</ref> Further optimization by Schroeder and Atal with J.L. Hall was later reported in a 1979 paper.<ref name="Schroeder1979"/> That same year, a psychoacoustic masking codec was also proposed by M. A. Krasner,<ref name="Krasner" /> who published and produced hardware for speech (not usable as music bit-compression), but the publication of his results in a relatively obscure [[Lincoln Laboratory]] Technical Report<ref>{{cite web|last1= Krasner|first1= M. A.|title= Digital Encoding of Speech Based on the Perceptual Requirement of the Auditory System (Technical Report 535)|url= https://apps.dtic.mil/dtic/tr/fulltext/u2/a077355.pdf|ref= Lincoln Laboratory, MIT|date= 18 June 1979|url-status= live|archive-url= https://web.archive.org/web/20170903070321/https://www.dtic.mil/dtic/tr/fulltext/u2/a077355.pdf|archive-date= 3 September 2017}}</ref> did not immediately influence the mainstream of psychoacoustic codec-development.

The [[discrete cosine transform]] (DCT), a type of [[transform coding]] for lossy compression, proposed by [[N. Ahmed|Nasir Ahmed]] in 1972, was developed by Ahmed with T. Natarajan and [[K. R. Rao]] in 1973; they published their results in 1974.<ref>{{cite journal |last= Ahmed |first= Nasir |author-link= N. Ahmed |title= How I Came Up With the Discrete Cosine Transform |journal= [[Digital Signal Processing (journal)|Digital Signal Processing]] |date= January 1991 |volume= 1 |issue= 1 |pages= 4–5 |doi= 10.1016/1051-2004(91)90086-Z |bibcode= 1991DSP.....1....4A |url= https://www.scribd.com/doc/52879771/DCT-History-How-I-Came-Up-with-the-Discrete-Cosine-Transform |access-date= 19 November 2019 |archive-date= 10 June 2016 |archive-url= https://web.archive.org/web/20160610013109/https://www.scribd.com/doc/52879771/DCT-History-How-I-Came-Up-with-the-Discrete-Cosine-Transform |url-status= live }}</ref><ref>{{Citation |first1= Nasir |last1= Ahmed |author1-link= N. Ahmed |first2= T. |last2= Natarajan |first3= K. R. |last3= Rao |title= Discrete Cosine Transform |journal= IEEE Transactions on Computers |date= January 1974 |volume= C-23 |issue= 1 |pages= 90–93 |doi= 10.1109/T-C.1974.223784|s2cid= 149806273 }}</ref><ref>{{Citation |last1= Rao |first1= K. R. |author-link1= K. R. Rao |last2= Yip |first2= P. |title= Discrete Cosine Transform: Algorithms, Advantages, Applications |publisher= Academic Press |location= Boston |year= 1990 |isbn= 978-0-12-580203-1}}</ref> This led to the development of the [[modified discrete cosine transform]] (MDCT), proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,<ref>J. P. Princen, A. W. Johnson und A. B. Bradley: ''Subband/transform coding using filter bank designs based on time domain aliasing cancellation'', IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987</ref> following earlier work by Princen and Bradley in 1986.<ref>John P. Princen, Alan B. Bradley: ''Analysis/synthesis filter bank design based on time domain aliasing cancellation'', IEEE Trans. Acoust. Speech Signal Processing, ''ASSP-34'' (5), 1153–1161, 1986</ref> The MDCT later became a core part of the MP3 algorithm.<ref name="Guckert">{{cite web |last1= Guckert |first1= John |title= The Use of FFT and MDCT in MP3 Audio Compression |url= http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |website= [[University of Utah]] |date= Spring 2012 |access-date= 14 July 2019 |archive-date= 12 February 2021 |archive-url= https://web.archive.org/web/20210212022237/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |url-status= live }}</ref>

Ernst Terhardt and other collaborators constructed an algorithm describing auditory masking with high accuracy in 1982.<ref name="Terhardt1982" /> This work added to a variety of reports from authors dating back to Fletcher, and to the work that initially determined critical ratios and critical bandwidths.

In 1985, Atal and Schroeder presented [[code-excited linear prediction]] (CELP), an LPC-based perceptual speech-coding algorithm with auditory masking that achieved a significant [[data compression ratio]] for its time.<ref name="Schroeder2014"/> [[IEEE]]'s refereed ''Journal on Selected Areas in Communications'' reported on a wide variety of (mostly perceptual) audio compression algorithms in 1988.<ref name="Voice Coding for Communications" /> The "Voice Coding for Communications" edition published in February 1988 reported on a wide range of established, working audio bit compression technologies,<ref name="Voice Coding for Communications" /> some of them using auditory masking as part of their fundamental design, and several showing real-time hardware implementations.