Editing MPEG-1 (section)

==Part 3: Audio==
Part 3 of the MPEG-1 standard covers audio and is defined in ISO/IEC-11172-3.

MPEG-1 Audio utilizes [[psychoacoustic]]s to significantly reduce the data rate required by an audio stream. It reduces or completely discards certain parts of the audio that it deduces that the human ear can't ''hear'', either because they are in frequencies where the ear has limited sensitivity, or are ''[[Auditory masking|masked]]'' by other (typically louder) sounds.<ref name=mpeg_audio_faq/>

Channel encoding modes:
* Mono
* Joint stereo – [[Joint encoding#Intensity stereo coding|intensity encoded]]
* Joint stereo – [[Joint encoding#M/S stereo coding|M/S encoded]] (Layer III only)
* Stereo
* Dual (two [[wikt:correlated|uncorrelated]] mono channels)

[[Sampling (signal processing)#Sampling rate|Sampling rates]]:
* 32000&nbsp;Hz
* 44100&nbsp;Hz
* 48000&nbsp;Hz

[[Bit rate]]s:
* Layer I: 32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 352, 384, 416 and 448&nbsp;kbit/s<ref>{{citation |url=http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm |title=MPEG Audio Frame Header |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20150208104604/http://www.mpgedit.org/mpgedit/mpeg_format/mpeghdr.htm |archive-date=2015-02-08 }}</ref>
* Layer II: 32, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256, 320 and 384&nbsp;kbit/s
* Layer III: 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320&nbsp;kbit/s

MPEG-1 Audio is divided into 3 layers. Each higher layer is more computationally complex, and generally more efficient at lower bitrates than the previous.<ref name = bmrc_mpeg2_faq /> The layers are semi backwards compatible as higher layers reuse technologies implemented by the lower layers. A "full" Layer II decoder can also play Layer I audio, but ''not'' Layer III audio, although not all higher level players are "full".<ref name=mpeg_audio_faq/>

===Layer I===
{{Main|MPEG-1 Audio Layer I}}
MPEG-1 Audio Layer I is a simplified version of MPEG-1 Audio Layer II.<ref name=santa_clara90/> Layer I uses a smaller 384-sample frame size for very low delay, and finer resolution.<ref name=mpeg_faqs2/> This is advantageous for applications like teleconferencing, studio editing, etc. It has lower complexity than Layer II to facilitate [[Real-time computing|real-time]] encoding on the hardware available {{circa|1990}}.<ref name=mpeg1_audio/>

Layer I saw limited adoption in its time, and most notably was used on [[Philips]]' [[wikt:defunct|defunct]] [[Digital Compact Cassette]] at a bitrate of 384&nbsp;kbit/s.<ref name= mpeg_faqs1/> With the substantial performance improvements in digital processing since its introduction, Layer I quickly became unnecessary and obsolete.

Layer I audio files typically use the extension ".mp1" or sometimes ".m1a".

===Layer II===
{{Main|MPEG-1 Audio Layer II}}
MPEG-1 Audio Layer II (the first version of MP2, often informally called MUSICAM)<ref name=mpeg_audio_faq/> is a [[lossy]] audio format designed to provide high quality at about 192&nbsp;kbit/s for stereo sound.<ref>{{Cite web |url=https://mpeg.chiariglione.org/standards/mpeg-2/audio |title=Audio Standard: MPEG-2 Part number: 3 |access-date=2024-02-07 |archive-date=2024-02-04 |archive-url=https://web.archive.org/web/20240204214525/https://mpeg.chiariglione.org/standards/mpeg-2/audio |url-status=live }}</ref> Decoding MP2 audio is [[Computational complexity theory|computationally simple]] relative to MP3, [[Advanced Audio Coding|AAC]], etc.

====History/MUSICAM====
MPEG-1 Audio Layer II was derived from the MUSICAM (''Masking pattern adapted Universal Subband Integrated Coding And Multiplexing'') audio codec, developed by [[Centre commun d'études de télévision et télécommunications]] (CCETT), [[Philips]], and [[Institut für Rundfunktechnik]] (IRT/CNET)<ref name=bmrc_mpeg2_faq/><ref name=santa_clara90/><ref name=telos_audio>{{Citation|first=Steve |last=Church |title=Perceptual Coding and MPEG Compression |publisher=NAB Engineering Handbook, [[Telos Systems]] |url=http://www.telos-systems.com/techtalk/mpeg/default.htm |archive-url=https://web.archive.org/web/20010508092243/http://www.telos-systems.com/techtalk/mpeg/default.htm |url-status=dead |archive-date=2001-05-08 |access-date=2008-04-09 }}</ref> as part of the [[EUREKA 147]] pan-European inter-governmental research and development initiative for the development of digital audio broadcasting.

Most key features of MPEG-1 Audio were directly inherited from MUSICAM, including the filter bank, time-domain processing, audio frame sizes, etc. However, improvements were made, and the actual MUSICAM algorithm was not used in the final MPEG-1 Audio Layer II standard. The widespread usage of the term MUSICAM to refer to Layer II is entirely incorrect and discouraged for both technical and legal reasons.<ref name=mpeg_audio_faq>{{Citation |first1=D. |last1=Thom |first2=H. |last2=Purnhagen |title=MPEG Audio FAQ Version 9 |date=October 1998 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20100218081343/http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |archive-date=2010-02-18 }}</ref>

====Technical details====
MP2 is a time-domain encoder. It uses a low-delay 32 sub-band [[Polyphase quadrature filter|polyphased]] [[filter bank]] for time-frequency mapping; having overlapping ranges (i.e. polyphased) to prevent aliasing.<ref name=audio_tutorial/> The psychoacoustic model is based on the principles of [[auditory masking]], [[simultaneous masking]] effects, and the [[absolute threshold of hearing]] (ATH). The size of a Layer II frame is fixed at 1152-samples (coefficients).

[[Time domain]] refers to how analysis and quantization is performed on short, discrete samples/chunks of the audio waveform. This offers low delay as only a small number of samples are analyzed before encoding, as opposed to [[frequency domain]] encoding (like MP3) which must analyze many times more samples before it can decide how to transform and output encoded audio. This also offers higher performance on complex, random and [[Transient (acoustics)|transient]] impulses (such as percussive instruments, and applause), offering avoidance of artifacts like pre-echo.

The 32 sub-band filter bank returns 32 [[amplitude]] [[wikt:coefficient|coefficients]], one for each equal-sized frequency band/segment of the audio, which is about 700&nbsp;Hz wide (depending on the audio's sampling frequency). The encoder then utilizes the psychoacoustic model to determine which sub-bands contain audio information that is less important, and so, where quantization will be inaudible, or at least much less noticeable.<ref name=mpeg1_audio/>

[[File:Fft-2.png|thumb|right|upright=1.55|Example FFT analysis on an audio wave sample]]

The psychoacoustic model is applied using a 1024-point [[fast Fourier transform]] (FFT). Of the 1152 samples per frame, 64 samples at the top and bottom of the frequency range are ignored for this analysis. They are presumably not significant enough to change the result. The psychoacoustic model uses an empirically determined masking model to determine which sub-bands contribute more to the [[masking threshold]], and how much quantization noise each can contain without being perceived. Any sounds below the [[absolute threshold of hearing]] (ATH) are completely discarded. The available bits are then assigned to each sub-band accordingly.<ref name=mpeg_audio_faq/><ref name=audio_tutorial/>

Typically, sub-bands are less important if they contain quieter sounds (smaller coefficient) than a neighboring (i.e. similar frequency) sub-band with louder sounds (larger coefficient). Also, "noise" components typically have a more significant masking effect than "tonal" components.<ref name=telos_audio/>

Less significant sub-bands are reduced in accuracy by quantization. This basically involves compressing the frequency range (amplitude of the coefficient), i.e. raising the noise floor. Then computing an amplification factor, for the decoder to use to re-expand each sub-band to the proper frequency range.<ref name=smith_transcoding_survey>{{Citation |first=Brian |last=Smith |title=A Survey of Compressed Domain Processing Techniques |pages=7 |year=1996 |publisher=[[Cornell University]] |url=http://citeseer.ist.psu.edu/257196.html |access-date=2008-04-09 |url-status=live |archive-url=http://archive.wikiwix.com/cache/20110223164151/http://citeseer.ist.psu.edu/257196.html |archive-date=2011-02-23 }}{{registration required|s}}</ref><ref name=twolame_psycho>{{Citation |first=Mike |last=Cheng |title=Psychoacoustic Models in TwoLAME |publisher=twolame.org |url=http://www.twolame.org/doc/psycho.html |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20161022063134/http://www.twolame.org/doc/psycho.html |archive-date=2016-10-22 }}</ref>

Layer II can also optionally use [[Joint stereo#Intensity stereo coding|intensity stereo]] coding, a form of joint stereo. This means that the frequencies above 6&nbsp;kHz of both channels are combined/down-mixed into one single (mono) channel, but the "side channel" information on the relative intensity (volume, amplitude) of each channel is preserved and encoded into the bitstream separately. On playback, the single channel is played through left and right speakers, with the intensity information applied to each channel to give the illusion of stereo sound.<ref name=mpeg1_audio/><ref name=telos_audio/> This perceptual trick is known as "stereo irrelevancy". This can allow further reduction of the audio bitrate without much perceivable loss of fidelity, but is generally not used with higher bitrates as it does not provide very high quality (transparent) audio.<ref name=mpeg1_audio>{{Citation|first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |archive-url=https://web.archive.org/web/20100430190803/http://mpeg.chiariglione.org/technologies/mpeg-1/mp01-aud/index.htm |url-status=dead |archive-date=2010-04-30 }}</ref><ref name=audio_tutorial/><ref>{{Citation |first1=B. |last1=Grill |first2=S. |last2=Quackenbush |title=MPEG-1 Audio |date=October 2005 |url=http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |access-date=2016-11-11 |archive-url=https://web.archive.org/web/20080427195833/http://www.chiariglione.org/mpeg/technologies/mp01-aud/index.htm |archive-date=2008-04-27}}</ref><ref name=joint_stereo_spatial/>

====Quality====
Subjective audio testing by experts, in the most critical conditions ever implemented, has shown MP2 to offer [[Transparency (data compression)|transparent]] audio compression at 256&nbsp;kbit/s for 16-bit 44.1&nbsp;kHz [[Red Book (audio CD standard)|CD audio]] using the earliest reference implementation (more recent encoders should presumably perform even better).<ref name=mpeg_faqs1>{{Citation |first1=Mark |last1=Adler |first2=Harald |last2=Popp |first3=Morten |last3=Hjerde |title=MPEG-FAQ: multimedia compression [1/9] |date=November 9, 1996 |publisher=faqs.org |url=http://www.faqs.org/faqs/mpeg-faq/part1/ |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20170104010328/http://www.faqs.org/faqs/mpeg-faq/part1/ |archive-date=January 4, 2017 }}</ref><ref name=telos_audio/><ref name=audio_tutorial/><ref>C.Grewin, and T.Ryden, ''Subjective Assessments on Low Bit-rate Audio Codecs'', Proceedings of the 10th International AES Conference, pp 91 - 102, London 1991</ref> That (approximately) 1:6 compression ratio for CD audio is particularly impressive because it is quite close to the estimated upper limit of perceptual [[Entropy (information theory)|entropy]], at just over 1:8.<ref>J. Johnston, ''Estimation of Perceptual Entropy Using Noise Masking Criteria,'' in Proc. ICASSP-88, pp. 2524-2527, May 1988.</ref><ref>J. Johnston, ''Transform Coding of Audio Signals Using Perceptual Noise Criteria,'' IEEE Journal on Select Areas in Communications, vol. 6, no. 2, pp. 314-323, Feb. 1988.</ref> Achieving much higher compression is simply not possible without discarding some perceptible information.

MP2 remains a favoured lossy audio coding standard due to its particularly high audio coding performances on important audio material such as castanet, symphonic orchestra, male and female voices and particularly complex and high energy transients (impulses) like percussive sounds: triangle, glockenspiel and audience applause.<ref name=mpeg_faqs2/> More recent testing has shown that [[MPEG Multichannel]] (based on MP2), despite being compromised by an inferior matrixed mode (for the sake of backwards compatibility)<ref name=mpeg_faqs1/><ref name=audio_tutorial/> rates just slightly lower than much more recent audio codecs, such as [[Dolby Digital]] (AC-3) and [[Advanced Audio Coding]] (AAC) (mostly within the margin of error&mdash;and substantially superior in some cases, such as audience applause).<ref>Wustenhagen et al., ''Subjective Listening Test of Multi-channel Audio Codecs'', AES 105th Convention Paper 4813, San Francisco 1998</ref><ref name=ebu_surround_test_2007>{{Citation |last=B/MAE Project Group |title=EBU evaluations of multichannel audio codecs |date=September 2007 |publisher=[[European Broadcasting Union]] |url=http://www.ebu.ch/CMSimages/en/tec_doc_t3324-2007_tcm6-53801.pdf |archive-url=https://web.archive.org/web/20081030043259/http://www.ebu.ch/CMSimages/en/tec_doc_t3324-2007_tcm6-53801.pdf |url-status=dead |archive-date=2008-10-30 |access-date=2008-04-09 }}</ref> This is one reason that MP2 audio continues to be used extensively. The MPEG-2 AAC Stereo verification tests reached a vastly different conclusion, however, showing AAC to provide superior performance to MP2 at half the bitrate.<ref name=stereo_aac_tests>{{Citation |first1=David |last1=Meares |first2=Kaoru |last2=Watanabe |first3=Eric |last3=Scheirer |title=Report on the MPEG-2 AAC Stereo Verification Tests |pages=18 |date=February 1998 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://sound.media.mit.edu/mpeg4/audio/public/w2006.pdf |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20080414072345/http://sound.media.mit.edu/mpeg4/audio/public/w2006.pdf |archive-date=April 14, 2008 }}</ref> The reason for this disparity with both earlier and later tests is not clear, but strangely, a sample of applause is notably absent from the latter test.

Layer II audio files typically use the extension ".mp2" or sometimes ".m2a".

===Layer III===
{{Main|MPEG-1 Audio Layer III}}
MPEG-1 Audio Layer III (the first version of [[MP3]]) is a [[lossy]] audio format designed to provide acceptable quality at about 64&nbsp;kbit/s for monaural audio over single-channel ([[basic rate interface|BRI]]) [[ISDN]] links, and 128&nbsp;kbit/s for stereo sound.

====History/ASPEC====
[[File:2016-07 ASPEC 91 Bonn.jpg|thumb|ASPEC 91 in the [[Deutsches Museum Bonn]], with encoder (below) and decoder]]
MPEG-1 Audio Layer III was derived from the ''Adaptive Spectral Perceptual Entropy Coding'' (ASPEC) codec developed by Fraunhofer as part of the [[EUREKA 147]] pan-European inter-governmental research and development initiative for the development of digital audio broadcasting. ASPEC was adapted to fit in with the Layer II model (frame size, filter bank, FFT, etc.), to become Layer III.<ref name=santa_clara90/>

ASPEC was itself based on ''Multiple adaptive Spectral audio Coding'' (MSC) by [[E. F. Schroeder]],<!--at ???--> ''Optimum Coding in the Frequency domain'' (OCF) the [[doctoral thesis]] by [[Karlheinz Brandenburg]] at the [[University of Erlangen-Nuremberg]], ''Perceptual Transform Coding'' (PXFM) by [[J. D. Johnston]] at [[AT&T Corporation|AT&T]] [[Bell Labs]], and ''Transform coding of audio signals'' by [[Y. Mahieux]] and [[J. Petit]] at [[Institut für Rundfunktechnik]] (IRT/CNET).<ref name=perceptual_coding>{{Citation |first1=Ted |last1=Painter |first2=Andreas |last2=Spanias |title=Perceptual Coding of Digital Audio (Proceedings of the IEEE, VOL. 88, NO. 4) |date=April 2000 |publisher=[[Proceedings of the IEEE]] |url=http://www.ee.columbia.edu/~marios/courses/e6820y02/project/papers/Perceptual%20coding%20of%20digital%20audio%20.pdf |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20060916012236/http://www.ee.columbia.edu/~marios/courses/e6820y02/project/papers/Perceptual%20coding%20of%20digital%20audio%20.pdf |archive-date=September 16, 2006}}</ref>

====Technical details====
MP3 is a frequency-domain audio [[Transform coding|transform encoder]]. Even though it utilizes some of the lower layer functions, MP3 is quite different from MP2.

MP3 works on 1152 samples like MP2, but needs to take multiple frames for analysis before frequency-domain (MDCT) processing and quantization can be effective. It outputs a variable number of samples, using a bit buffer to enable this variable bitrate (VBR) encoding while maintaining 1152 sample  size output frames. This causes a significantly longer delay before output, which has caused MP3 to be considered unsuitable for studio applications where editing or other processing needs to take place.<ref name=audio_tutorial/>

MP3 does not benefit from the 32 sub-band polyphased filter bank, instead just using an 18-point MDCT transformation on each output to split the data into 576 frequency components, and processing it in the frequency domain.<ref name=telos_audio/> This extra [[wikt:granularity|granularity]] allows MP3 to have a much finer psychoacoustic model, and more carefully apply appropriate quantization to each band, providing much better low-bitrate performance.

Frequency-domain processing imposes some limitations as well, causing a factor of 12 or 36 &times; worse temporal resolution than Layer II. This causes quantization artifacts, due to transient sounds like percussive events and other high-frequency events that spread over a larger window. This results in audible smearing and [[pre-echo]].<ref name=audio_tutorial>{{Citation|first=Davis |last=Pan |title=A Tutorial on MPEG/Audio Compression |pages=8 |date=Summer 1995 |publisher=IEEE MultiMedia Journal |url=https://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf |archive-url=https://web.archive.org/web/20040919073530/https://www.cs.columbia.edu/~coms6181/slides/6R/mpegaud.pdf |url-status=dead |archive-date=2004-09-19 |access-date=2008-04-09 }}</ref> MP3 uses pre-echo detection routines, and VBR encoding, which allows it to temporarily increase the bitrate during difficult passages, in an attempt to reduce this effect. It is also able to switch between the normal 36 sample quantization window, and instead using 3&times; short 12 sample windows instead, to reduce the temporal (time) length of quantization artifacts.<ref name=audio_tutorial/> And yet in choosing a fairly small window size to make MP3's temporal response adequate enough to avoid the most serious artifacts, MP3 becomes much less efficient in frequency domain compression of stationary, tonal components.

Being forced to use a ''hybrid'' time domain (filter bank) /frequency domain (MDCT) model to fit in with Layer II simply wastes processing time and compromises quality by introducing aliasing artifacts. MP3 has an aliasing cancellation stage specifically to mask this problem, but which instead produces frequency domain energy which must be encoded in the audio. This is pushed to the top of the frequency range, where most people have limited hearing, in hopes the distortion it causes will be less audible.

Layer II's 1024 point FFT doesn't entirely cover all samples, and would omit several entire MP3 sub-bands, where quantization factors must be determined. MP3 instead uses two passes of FFT analysis for spectral estimation, to calculate the global and individual masking thresholds. This allows it to cover all 1152 samples. Of the two, it utilizes the global masking threshold level from the more critical pass, with the most difficult audio.

In addition to Layer II's intensity encoded joint stereo, MP3 can use middle/side (mid/side, m/s, MS, matrixed) joint stereo. With mid/side stereo, certain frequency ranges of both channels are merged into a single (middle, mid, L+R) mono channel, while the sound difference between the left and right channels is stored as a separate (side, L-R) channel. Unlike intensity stereo, this process does not discard any audio information. When combined with quantization, however, it can exaggerate artifacts.

If the difference between the left and right channels is small, the side channel will be small, which will offer as much as a 50% bitrate savings, and associated quality improvement. If the difference between left and right is large, standard (discrete, left/right) stereo encoding may be preferred, as mid/side joint stereo will not provide any benefits. An MP3 encoder can switch between m/s stereo and full stereo on a frame-by-frame basis.<ref name=telos_audio/><ref name=joint_stereo_spatial>{{Citation|first=Jurgen |last=Herre |title=From Joint Stereo to Spatial Audio Coding |pages=2 |date=October 5, 2004 |publisher=[[International Conference on Digital Audio Effects]] |url=http://dafx04.na.infn.it/WebProc/Proc/P_157.pdf |archive-url=https://web.archive.org/web/20060405112352/http://dafx04.na.infn.it/WebProc/Proc/P_157.pdf |url-status=dead |archive-date=April 5, 2006 |access-date=2008-04-17 }}</ref><ref name=lame_ms>{{Citation |first=Roberto |last=Amorim |title=GPSYCHO - Mid/Side Stereo |date=September 19, 2006 |publisher=[[LAME]] |url=http://lame.sourceforge.net/ms_stereo.php |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20161216140230/http://lame.sourceforge.net/ms_stereo.php |archive-date=December 16, 2016 }}</ref>

Unlike Layers I and II, MP3 uses variable-length [[Huffman coding]] (after perceptual) to further reduce the bitrate, without any further quality loss.<ref name=mpeg_audio_faq/><ref name=audio_tutorial/>

====Quality====
MP3's more fine-grained and selective quantization does prove notably superior to MP2 at lower-bitrates. It is able to provide nearly equivalent audio quality to Layer II, at a 15% lower bitrate (approximately).<ref name=ebu_surround_test_2007/><ref name=stereo_aac_tests/><!--First ref proves the point, second scares off MP3 fans that feel like arguing--> 128&nbsp;kbit/s is considered the "sweet spot" for MP3; meaning it provides generally acceptable quality stereo sound on most music, and there are [[diminishing returns|diminishing]] quality improvements from increasing the bitrate further. MP3 is also regarded as exhibiting artifacts that are less annoying than Layer II, when both are used at bitrates that are too low to possibly provide faithful reproduction.

Layer III audio files use the extension ".mp3".
<!--aliasing compensation: still need more details-->

===MPEG-2 audio extensions===
The [[MPEG-2]] standard includes several extensions to MPEG-1 Audio.<ref name=audio_tutorial/> These are known as MPEG-2 BC – backwards compatible with MPEG-1 Audio.<ref name="mpeg-audio-faq-bc">{{cite web |url=http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |title=MPEG Audio FAQ Version 9 – MPEG-1 and MPEG-2 BC |author=ISO |date=October 1998 |publisher=ISO |access-date=2016-11-11 |url-status=dead |archive-url=https://web.archive.org/web/20100218081343/http://mpeg.chiariglione.org/faq/mp1-aud/mp1-aud.htm |archive-date=2010-02-18 }}</ref><ref name="mpeg-audio">{{cite web |url=http://mpeg.chiariglione.org/faq/audio.htm |title=MPEG Audio FAQ Version 9 - MPEG Audio |author=D. Thom, H. Purnhagen, and the MPEG Audio Subgroup |date=October 1998 |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20110807233226/http://mpeg.chiariglione.org/faq/audio.htm |archive-date=2011-08-07 }}</ref><ref name="mpeg-bc">{{cite web|url=http://www.mpeg.org/MPEG/audio/aac.html |archive-url=https://web.archive.org/web/20070831110756/http://www.mpeg.org/MPEG/audio/aac.html |url-status=dead |archive-date=2007-08-31 |title=AAC |author=MPEG.ORG |access-date=2009-10-28 }}</ref><ref name="iso13818-7-2006-pdf">{{citation |url=http://webstore.iec.ch/preview/info_isoiec13818-7%7Bed4.0%7Den.pdf |title=ISO/IEC 13818-7, Fourth edition, Part 7 – Advanced Audio Coding (AAC) |author=ISO |date=2006-01-15 |access-date=2016-11-11 |url-status=live |archive-url=https://web.archive.org/web/20090306055335/http://webstore.iec.ch/preview/info_isoiec13818-7%7Bed4.0%7Den.pdf |archive-date=2009-03-06 }}</ref> MPEG-2 Audio is defined in ISO/IEC 13818-3.

*[[MPEG Multichannel]] – Backward compatible 5.1-channel [[surround sound]].<ref name=sydney93/>
*[[Sampling rate]]s: 16000, 22050, and 24000&nbsp;Hz
*[[Bitrate]]s: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144 and 160&nbsp;kbit/s

These sampling rates are exactly half that of those originally defined for MPEG-1 Audio. They were introduced to maintain higher quality sound when encoding audio at lower-bitrates.<ref name=sydney93/> The even-lower bitrates were introduced because tests showed that MPEG-1 Audio could provide higher quality than any existing ({{circa|1994}}) very low bitrate (i.e. [[Speech coding|speech]]) audio codecs.<ref name=singapore94>{{Citation|first=Leonardo |last=Chiariglione |title=Press Release |date=November 11, 1994 |publisher=[[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] |url=http://mpeg.chiariglione.org/meetings/singapore94/singapore_press.htm |archive-url=https://web.archive.org/web/20100808100029/http://mpeg.chiariglione.org/meetings/singapore94/singapore_press.htm |url-status=dead |archive-date=August 8, 2010 |access-date=2008-04-09 }}</ref>