Editing MP3 (section)

=== Encoding and decoding ===
In short, MP3 compression works by reducing the accuracy of certain components of sound that are considered (by psychoacoustic analysis) to be beyond the [[Hearing range#Humans|hearing capabilities]] of most humans. This method is commonly referred to as perceptual coding or [[psychoacoustic]] modeling.<ref name="Jayant1993" /> The remaining audio information is then recorded in a space-efficient manner using [[MDCT]] and [[FFT]] algorithms.

The MP3 encoding algorithm is generally split into four parts. Part 1 divides the audio signal into smaller pieces, called frames, and an MDCT filter is then performed on the output. Part 2 passes the sample into a 1024-point [[fast Fourier transform]] (FFT), then the [[psychoacoustic]] model is applied and another MDCT filter is performed on the output. Part 3 quantifies and encodes each sample, known as noise allocation, which adjusts itself to meet the bit rate and [[sound masking]] requirements. Part 4 formats the [[bitstream]], called an audio frame, which is made up of 4 parts, the [[Header (computing)|header]], [[Error checking|error check]], [[audio data]], and [[#Ancillary data|ancillary data]].<ref name="Guckert"/>

The [[MPEG-1]] standard does not include a precise specification for an MP3 encoder but does provide examples of psychoacoustic models, rate loops, and the like in the non-normative part of the original standard.<ref name="mpeg1" /> MPEG-2 doubles the number of sampling rates that are supported and MPEG-2.5 adds 3 more. When this was written, the suggested implementations were quite dated. Implementers of the standard were supposed to devise algorithms suitable for removing parts of the information from the audio input. As a result, many different MP3 encoders became available, each producing files of differing quality. Comparisons were widely available, so it was easy for a prospective user of an encoder to research the best choice. Some encoders that were proficient at encoding at higher bit rates (such as [[LAME]]) were not necessarily as good at lower bit rates. Over time, LAME evolved on the SourceForge website until it became the de facto CBR MP3 encoder. Later an ABR mode was added. Work progressed on true [[variable bit rate]] using a quality goal between 0 and 10. Eventually, numbers (such as -V 9.600) could generate excellent quality low bit rate voice encoding at only {{nowrap|41 kbit/s}} using the MPEG-2.5 extensions.

MP3 uses an overlapping MDCT structure. Each MPEG-1 MP3 frame is 1152 samples, divided into two granules of 576 samples. These samples, initially in the time domain, are transformed in one block to 576 [[Fourier Transform|frequency-domain samples]] by MDCT.<ref>{{cite web |last=Taylor |first=Mark |date=June 2000 |title=LAME Technical FAQ |url=https://lame.sourceforge.io/tech-FAQ.txt |access-date=9 December 2023 |archive-date=8 December 2023 |archive-url=https://web.archive.org/web/20231208232048/https://lame.sourceforge.io/tech-FAQ.txt |url-status=live }}</ref> MP3 also allows the use of shorter blocks in a granule, down to a size of 192 samples; this feature is used when a [[Transient (acoustics)|transient]] is detected. Doing so limits the temporal spread of quantization noise accompanying the transient (see [[psychoacoustics]]). Frequency resolution is limited by the small long block window size, which decreases coding efficiency.<ref name="Limitations"/> Time resolution can be too low for highly transient signals and may cause smearing of percussive sounds.<ref name="Limitations" />

Due to the tree structure of the filter bank, pre-echo problems are made worse, as the combined impulse response of the two filter banks does not, and cannot, provide an optimum solution in time/frequency resolution.<ref name="Limitations"/> Additionally, the combining of the two filter banks' outputs creates aliasing problems that must be handled partially by the "aliasing compensation" stage; however, that creates excess energy to be coded in the frequency domain, thereby decreasing coding efficiency.<ref>{{Cite book|last=Liberman|first=Serbio|title=DSP - The Technology Behind Multimedia|language=English}}</ref>

Decoding, on the other hand, is carefully defined in the standard. Most [[Codec|decoders]] are "[[Elementary stream|bitstream]] compliant", which means that the decompressed output that they produce from a given MP3 file will be the same, within a specified degree of [[rounding]] tolerance, as the output specified mathematically in the ISO/IEC high standard document (ISO/IEC 11172-3). Therefore, the comparison of decoders is usually based on how computationally efficient they are (i.e., how much [[computer memory|memory]] or [[CPU]] time they use in the decoding process). Over time this concern has become less of an issue as [[CPU clock rate]]s transitioned from MHz to GHz. Encoder/decoder overall delay is not defined, which means there is no official provision for [[gapless playback]]. However, some encoders such as LAME can attach additional metadata that will allow players that can handle it to deliver seamless playback.