Editing MPEG-1 (section)

====Technical details====
MP2 is a time-domain encoder. It uses a low-delay 32 sub-band [[Polyphase quadrature filter|polyphased]] [[filter bank]] for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing.<ref name=audio_tutorial/> The psychoacoustic model is based on the principles of [[auditory masking]], [[simultaneous masking]] effects, and the [[absolute threshold of hearing]] (ATH). The size of a Layer II frame is fixed at 1152-samples (coefficients).

[[Time domain]] refers to how analysis and quantization is performed on short, discrete samples/chunks of the audio waveform. This offers low delay as only a small number of samples are analyzed before encoding, as opposed to [[frequency domain]] encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and [[Transient (acoustics)|transient]] impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo.

The 32 sub-band filter bank returns 32 [[amplitude]] [[wikt:coefficient|coefficients]], one for each equal-sized frequency band/segment of the audio, which is about 700&nbsp;Hz wide (depending on the audio's sampling frequency). The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable.<ref name=mpeg1_audio/>

[[File:Fft-2.png|thumb|right|upright=1.55|Example FFT analysis on an audio wave sample]]

The psychoacoustic model is applied using a 1024-point [[fast Fourier transform]] (FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the [[masking threshold]], and how much quantization noise each can contain without being perceived. Any sounds below the [[absolute threshold of hearing]] (ATH) are completely discarded. The available bits are then assigned to each sub-band accordingly.<ref name=mpeg_audio_faq/><ref name=audio_tutorial/>

Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i.e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components.<ref name=telos_audio/>

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i.e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range.<ref name=smith_transcoding_survey>{{Citation |first=Brian |last=Smith |title=A Survey of Compressed Domain Processing Techniques |pages=7 |year=1996 |publisher=[[Cornell University]] |url=http://citeseer.ist.psu.edu/257196.html |access-date=2008-04-09 |url-status=live |archive-url=http://archive.wikiwix.com/cache/20110223164151/http://citeseer.ist.psu.edu/257196.html |archive-date=2011-02-23 }}{{registration required|s}}</ref><ref name=twolame_psycho>{{Citation |first=Mike |last=Cheng |title=Psychoacoustic Models in TwoLAME |publisher=twolame.org |url=http://www.twolame.org/doc/psycho.html |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20161022063134/http://www.twolame.org/doc/psycho.html |archive-date=2016-10-22 }}</ref>

Layer II can also optionally use [[Joint stereo#Intensity stereo coding|intensity stereo]] coding, a form of joint stereo. This means that the frequencies above 6&nbsp;kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound.<ref name=mpeg1_audio/><ref name=telos_audio/> This perceptual trick is known as "stereo irrelevancy". This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio.<ref name=mpeg1_audio>{{Citation|first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |archive-url=https://web.archive.org/web/20100430190803/http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |url-status=dead |archive-date=2010-04-30 }}</ref><ref name=audio_tutorial/><ref>{{Citation |first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |url=http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |access-date=2016-11-11 |archive-url=https://web.archive.org/web/20080427195833/http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |archive-date=2008-04-27}}</ref><ref name=joint_stereo_spatial/>