Editing MPEG-1 (section)

===Layer II===
{{Main|MPEG-1 Audio Layer II}}
MPEG-1 Audio Layer II (the first version of MP2, often informally called MUSICAM)<ref name=mpeg_audio_faq/> is a [[lossy]] audio format designed to provide high quality at about 192&nbsp;kbit/s for stereo sound.<ref>{{Cite web |url=https://mpeg.chiariglione.org/standards/mpeg-2/audio |title=Audio Standard: MPEG-2 Part number: 3 |access-date=2024-02-07 |archive-date=2024-02-04 |archive-url=https://web.archive.org/web/20240204214525/https://mpeg.chiariglione.org/standards/mpeg-2/audio |url-status=live }}</ref> Decoding MP2 audio is [[Computational complexity theory|computationally simple]] relative to MP3, [[Advanced Audio Coding|AAC]], etc.

====History/MUSICAM====
MPEG-1 Audio Layer II was derived from the MUSICAM (''Masking pattern adapted Universal Subband Integrated Coding And Multiplexing'') audio codec, developed by [[Centre commun d'études de télévision et télécommunications]] (CCETT), [[Philips]], and [[Institut für Rundfunktechnik]] (IRT/CNET)<ref name=bmrc_mpeg2_faq/><ref name=santa_clara90/><ref name=telos_audio>{{Citation|first=Steve |last=Church |title=Perceptual Coding and MPEG Compression |publisher=NAB Engineering Handbook, [[Telos Systems]] |url=http://www.telos-systems.com/techtalk/mpeg/default.htm |archive-url=https://web.archive.org/web/20010508092243/http://www.telos-systems.com/techtalk/mpeg/default.htm |url-status=dead |archive-date=2001-05-08 |access-date=2008-04-09 }}</ref> as part of the [[EUREKA 147]] pan-European inter-governmental research and development initiative for the development of digital audio broadcasting.

Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.<ref name=mpeg_audio_faq>{{Citation |first1=D. |last1=Thom |first2=H. |last2=Purnhagen |title=MPEG Audio FAQ Version 9 |date=October 1998 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20100218081343/http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |archive-date=2010-02-18 }}</ref>

====Technical details====
MP2 is a time-domain encoder. It uses a low-delay 32 sub-band [[Polyphase quadrature filter|polyphased]] [[filter bank]] for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing.<ref name=audio_tutorial/> The psychoacoustic model is based on the principles of [[auditory masking]], [[simultaneous masking]] effects, and the [[absolute threshold of hearing]] (ATH). The size of a Layer II frame is fixed at 1152-samples (coefficients).

[[Time domain]] refers to how analysis and quantization is performed on short, discrete samples/chunks of the audio waveform. This offers low delay as only a small number of samples are analyzed before encoding, as opposed to [[frequency domain]] encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and [[Transient (acoustics)|transient]] impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo.

The 32 sub-band filter bank returns 32 [[amplitude]] [[wikt:coefficient|coefficients]], one for each equal-sized frequency band/segment of the audio, which is about 700&nbsp;Hz wide (depending on the audio's sampling frequency). The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable.<ref name=mpeg1_audio/>

[[File:Fft-2.png|thumb|right|upright=1.55|Example FFT analysis on an audio wave sample]]

The psychoacoustic model is applied using a 1024-point [[fast Fourier transform]] (FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the [[masking threshold]], and how much quantization noise each can contain without being perceived. Any sounds below the [[absolute threshold of hearing]] (ATH) are completely discarded. The available bits are then assigned to each sub-band accordingly.<ref name=mpeg_audio_faq/><ref name=audio_tutorial/>

Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i.e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components.<ref name=telos_audio/>

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i.e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range.<ref name=smith_transcoding_survey>{{Citation |first=Brian |last=Smith |title=A Survey of Compressed Domain Processing Techniques |pages=7 |year=1996 |publisher=[[Cornell University]] |url=http://citeseer.ist.psu.edu/257196.html |access-date=2008-04-09 |url-status=live |archive-url=http://archive.wikiwix.com/cache/20110223164151/http://citeseer.ist.psu.edu/257196.html |archive-date=2011-02-23 }}{{registration required|s}}</ref><ref name=twolame_psycho>{{Citation |first=Mike |last=Cheng |title=Psychoacoustic Models in TwoLAME |publisher=twolame.org |url=http://www.twolame.org/doc/psycho.html |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20161022063134/http://www.twolame.org/doc/psycho.html |archive-date=2016-10-22 }}</ref>

Layer II can also optionally use [[Joint stereo#Intensity stereo coding|intensity stereo]] coding, a form of joint stereo. This means that the frequencies above 6&nbsp;kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound.<ref name=mpeg1_audio/><ref name=telos_audio/> This perceptual trick is known as "stereo irrelevancy". This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio.<ref name=mpeg1_audio>{{Citation|first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |archive-url=https://web.archive.org/web/20100430190803/http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |url-status=dead |archive-date=2010-04-30 }}</ref><ref name=audio_tutorial/><ref>{{Citation |first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |url=http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |access-date=2016-11-11 |archive-url=https://web.archive.org/web/20080427195833/http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |archive-date=2008-04-27}}</ref><ref name=joint_stereo_spatial/>

====Quality====
Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer [[Transparency (data compression)|transparent]] audio compression at 256&nbsp;kbit/s for 16-bit 44.1&nbsp;kHz [[Red Book (audio CD standard)|CD audio]] using the earliest reference implementation (more recent encoders should presumably perform even better).<ref name=mpeg_faqs1>{{Citation |first1=Mark |last1=Adler |first2=Harald |last2=Popp |first3=Morten |last3=Hjerde |title=MPEG-FAQ: multimedia compression [1/9] |date=November 9, 1996 |publisher=faqs.org |url=http://www.faqs.org/faqs/mpeg-faq/part1/ |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20170104010328/http://www.faqs.org/faqs/mpeg-faq/part1/ |archive-date=January 4, 2017 }}</ref><ref name=telos_audio/><ref name=audio_tutorial/><ref>C.Grewin, and T.Ryden, ''Subjective Assessments on Low Bit-rate Audio Codecs'', Proceedings of the 10th International AES Conference, pp 91 - 102, London 1991</ref> That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual [[Entropy (information theory)|entropy]], at just over 1:8.<ref>J. Johnston, ''Estimation of Perceptual Entropy Using Noise Masking Criteria,'' in Proc. ICASSP-88, pp. 2524-2527, May 1988.</ref><ref>J. Johnston, ''Transform Coding of Audio Signals Using Perceptual Noise Criteria,'' IEEE Journal on Select Areas in Communications, vol. 6, no. 2, pp. 314-323, Feb. 1988.</ref> Achieving much higher compression is simply not possible without discarding some perceptible information.

MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause.<ref name=mpeg_faqs2/> More recent testing has shown that [[MPEG Multichannel]] (based on MP2), despite being compromised by an inferior matrixed mode (for the sake of backwards compatibility)<ref name=mpeg_faqs1/><ref name=audio_tutorial/> rates just slightly lower than much more recent audio codecs, such as [[Dolby Digital]] (AC-3) and [[Advanced Audio Coding]] (AAC) (mostly within the margin of error&mdash;and substantially superior in some cases, such as audience applause).<ref>Wustenhagen et al., ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998</ref><ref name=ebu_surround_test_2007>{{Citation |last=B/MAE Project Group |title=EBU evaluations of multichannel audio codecs |date=September 2007 |publisher=[[European Broadcasting Union]] |url=http://www.ebu.ch/CMSimages/en/tec_doc_t3324-2007_tcm6-53801.pdf |archive-url=https://web.archive.org/web/20081030043259/http://www.ebu.ch/CMSimages/en/tec_doc_t3324-2007_tcm6-53801.pdf |url-status=dead |archive-date=2008-10-30 |access-date=2008-04-09 }}</ref> This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate.<ref name=stereo_aac_tests>{{Citation |first1=David |last1=Meares |first2=Kaoru |last2=Watanabe |first3=Eric |last3=Scheirer |title=Report on the MPEG-2 AAC Stereo Verification Tests |pages=18 |date=February 1998 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://sound.media.mit.edu/mpeg4/audio/public/w2006.pdf |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20080414072345/http://sound.media.mit.edu/mpeg4/audio/public/w2006.pdf |archive-date=April 14, 2008 }}</ref> The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test.

Layer II audio files typically use the extension ".mp2" or sometimes ".m2a".