Editing Data compression (section)

=== Audio ===
{{see also|Audio coding format|Audio codec}}
Audio data compression, not to be confused with [[dynamic range compression]], has the potential to reduce the transmission [[Bandwidth (computing)|bandwidth]] and storage requirements of audio data. [[List of codecs#Audio compression formats|Audio compression formats compression algorithms]] are implemented in [[software]] as audio [[codec]]s. In both lossy and lossless compression, [[Redundancy (information theory)|information redundancy]] is reduced, using methods such as [[Coding theory|coding]], [[Quantization (signal processing)|quantization]], DCT and [[linear prediction]] to reduce the amount of information used to represent the uncompressed data.

Lossy audio compression algorithms provide higher compression and are used in numerous audio applications including [[Vorbis]] and [[MP3]]. These algorithms almost all rely on [[psychoacoustics]] to eliminate or reduce fidelity of less audible sounds, thereby reducing the space required to store or transmit them.<ref name="mahdi53"/><ref>{{cite journal |last1=Cunningham |first1=Stuart |last2=McGregor |first2=Iain |title=Subjective Evaluation of Music Compressed with the ACER Codec Compared to AAC, MP3, and Uncompressed PCM |journal=International Journal of Digital Multimedia Broadcasting |volume=2019 |pages=1–16 |date=2019 |language=en|doi=10.1155/2019/8265301 |doi-access=free }}</ref>

The acceptable trade-off between loss of audio quality and transmission or storage size depends upon the application. For example, one 640 MB [[compact disc]] (CD) holds approximately one hour of uncompressed [[high fidelity]] music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in the [[MP3]] format at a medium [[bit rate]]. A digital sound recorder can typically store around 200 hours of clearly intelligible speech in 640 MB.<ref name="Olympus WS-120"/>

Lossless audio compression produces a representation of digital data that can be decoded to an exact digital duplicate of the original. Compression ratios are around 50–60% of the original size,<ref name="FLAC comparison"/> which is similar to those for generic lossless data compression. Lossless codecs use [[curve fitting]] or linear prediction as a basis for estimating the signal. Parameters describing the estimation and the difference between the estimation and the actual signal are coded separately.<ref name="FLAC overview"/>

A number of lossless audio compression formats exist. See [[List of codecs#Lossless compression|list of lossless codecs]] for a listing. Some formats are associated with a distinct system, such as [[Direct Stream Transfer]], used in [[Super Audio CD]] and [[Meridian Lossless Packing]], used in [[DVD-Audio]], [[Dolby TrueHD]], [[Blu-ray]] and [[HD DVD]].

Some [[audio file format]]s feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include [[MPEG-4 SLS]] (Scalable to Lossless), [[WavPack]], and [[OptimFROG DualStream]].

When audio files are to be processed, either by further compression or for [[Audio editing|editing]], it is desirable to work from an unchanged original (uncompressed or losslessly compressed). Processing of a lossily compressed file for some purpose usually produces a final result inferior to the creation of the same compressed file from an uncompressed original. In addition to sound editing or mixing, lossless audio compression is often used for archival storage, or as master copies.

==== Lossy audio compression ====
[[File:AudiodatenkompressionManowarThePowerOfThySword.jpg|thumb|Comparison of [[spectrogram]]s of audio in an uncompressed format and several lossy formats. The lossy spectrograms show [[bandlimiting]] of higher frequencies, a common technique associated with lossy audio compression.]]

Lossy audio compression is used in a wide range of applications. In addition to standalone audio-only applications of file playback in MP3 players or computers, digitally compressed audio streams are used in most video DVDs, digital television, streaming media on the [[Internet]], satellite and cable radio, and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression, by discarding less-critical data based on [[psychoacoustic]] optimizations.<ref name="Jaiswal"/>

Psychoacoustics recognizes that not all data in an audio stream can be perceived by the human [[auditory system]]. Most lossy compression reduces redundancy by first identifying perceptually irrelevant sounds, that is, sounds that are very hard to hear. Typical examples include high frequencies or sounds that occur at the same time as louder sounds. Those irrelevant sounds are coded with decreased accuracy or not at all.

Due to the nature of lossy algorithms, [[audio quality]] suffers a [[digital generation loss]] when a file is decompressed and recompressed. This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, lossy formats such as [[MP3]] are very popular with end-users as the file size is reduced to 5-20% of the original size and a megabyte can store about a minute's worth of music at adequate quality.

Several proprietary lossy compression algorithms have been developed that provide higher quality audio performance by using a combination of lossless and lossy algorithms with adaptive bit rates and lower compression ratios. Examples include [[aptX]], [[LDAC (codec)|LDAC]], [[LHDC (codec)|LHDC]], [[Master Quality Authenticated#Codec description|MQA]] and [[SCL6 (codec)|SCL6]].

===== Coding methods =====
To determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the [[modified discrete cosine transform]] (MDCT) to convert [[time domain]] sampled waveforms into a transform domain, typically the [[frequency domain]]. Once transformed, component frequencies can be prioritized according to how audible they are. Audibility of spectral components is assessed using the [[absolute threshold of hearing]] and the principles of [[simultaneous masking]]—the phenomenon wherein a signal is masked by another signal separated by frequency—and, in some cases, [[temporal masking]]—where a signal is masked by another signal separated by time. [[Equal-loudness contour]]s may also be used to weigh the perceptual importance of components. Models of the human ear-brain combination incorporating such effects are often called [[psychoacoustic model]]s.<ref name="faxin47"/>

Other types of lossy compressors, such as the [[linear predictive coding]] (LPC) used with speech, are source-based coders. LPC uses a model of the human vocal tract to analyze speech sounds and infer the parameters used by the model to produce them moment to moment. These changing parameters are transmitted or stored and used to drive another model in the decoder which reproduces the sound.

Lossy formats are often used for the distribution of streaming audio or interactive communication (such as in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications.<ref name="Jaiswal"/>

[[Latency (engineering)|Latency]] is introduced by the methods used to encode and decode the data. Some codecs will analyze a longer segment, called a ''frame'', of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time to decode. The inherent latency of the coding algorithm can be critical; for example, when there is a two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples that must be analyzed before a block of audio is processed. In the minimum case, latency is zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23&nbsp;ms.

===== Speech encoding =====
[[Speech encoding]] is an important category of audio data compression. The perceptual models used to estimate what aspects of speech a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice is normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using a relatively low bit rate.

This is accomplished, in general, by some combination of two approaches:
* Only encoding sounds that could be made by a single human voice.
* Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human [[hearing]].

The earliest algorithms used in speech encoding (and audio data compression in general) were the [[A-law algorithm]] and the [[μ-law algorithm]].

==== History ====
[[File:Placa-audioPC-925.jpg|right|thumb|Solidyne 922: The world's first commercial audio bit compression [[sound card]] for PC, 1990]]

Early audio research was conducted at [[Bell Labs]]. There, in 1950, [[C. Chapin Cutler]] filed the patent on [[differential pulse-code modulation]] (DPCM).<ref name="DPCM"/> In 1973, [[Adaptive DPCM]] (ADPCM) was introduced by P. Cummiskey, [[Nikil Jayant|Nikil S. Jayant]] and [[James L. Flanagan]].<ref>{{cite journal|doi=10.1002/j.1538-7305.1973.tb02007.x|title=Adaptive Quantization in Differential PCM Coding of Speech|year=1973|last1=Cummiskey|first1=P.|last2=Jayant|first2=N. S.|last3=Flanagan|first3=J. L.|journal=Bell System Technical Journal|volume=52|issue=7|pages=1105–1118}}</ref><ref>{{cite journal |last1=Cummiskey |first1=P. |last2=Jayant |first2=Nikil S. |last3=Flanagan |first3=J. L. |title=Adaptive quantization in differential PCM coding of speech |journal=The Bell System Technical Journal |date=1973 |volume=52 |issue=7 |pages=1105–1118 |doi=10.1002/j.1538-7305.1973.tb02007.x |issn=0005-8580}}</ref>

[[Perceptual coding]] was first used for [[speech coding]] compression, with [[linear predictive coding]] (LPC).<ref name="Schroeder2014">{{cite book |last1=Schroeder |first1=Manfred R. |title=Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date=2014 |publisher=Springer |isbn=9783319056609 |chapter=Bell Laboratories |page=388 |chapter-url=https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> Initial concepts for LPC date back to the work of [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966.<ref>{{cite journal |last1=Gray |first1=Robert M. |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal=Found. Trends Signal Process. |date=2010 |volume=3 |issue=4 |pages=203–303 |doi=10.1561/2000000036 |url=https://ee.stanford.edu/~gray/lpcip.pdf |archive-url=https://web.archive.org/web/20100704113551/http://ee.stanford.edu/~gray/lpcip.pdf |archive-date=2010-07-04 |url-status=live |issn=1932-8346|doi-access=free }}</ref> During the 1970s, [[Bishnu S. Atal]] and [[Manfred R. Schroeder]] at [[Bell Labs]] developed a form of LPC called [[adaptive predictive coding]] (APC), a perceptual coding algorithm that exploited the masking properties of the human ear, followed in the early 1980s with the [[code-excited linear prediction]] (CELP) algorithm which achieved a significant [[data compression ratio|compression ratio]] for its time.<ref name="Schroeder2014"/> Perceptual coding is used by modern audio compression formats such as [[MP3]]<ref name="Schroeder2014"/> and [[Advanced Audio Codec|AAC]].

[[Discrete cosine transform]] (DCT), developed by [[N. Ahmed|Nasir Ahmed]], T. Natarajan and [[K. R. Rao]] in 1974,<ref name="DCT"/> provided the basis for the [[modified discrete cosine transform]] (MDCT) used by modern audio compression formats such as MP3,<ref name="Guckert">{{cite web |last1=Guckert |first1=John |title=The Use of FFT and MDCT in MP3 Audio Compression |url=http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-url=https://web.archive.org/web/20140124152337/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |archive-date=2014-01-24 |url-status=live |website=[[University of Utah]] |date=Spring 2012 |access-date=14 July 2019}}</ref> [[Dolby Digital]],<ref name="Luo">{{cite book |last1=Luo |first1=Fa-Long |title=Mobile Multimedia Broadcasting Standards: Technology and Practice |date=2008 |publisher=[[Springer Science & Business Media]] |isbn=9780387782638 |page=590 |url=https://books.google.com/books?id=l6PovWat8SMC&pg=PA590}}</ref><ref>{{cite journal |last1=Britanak |first1=V. |title=On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards |journal=IEEE Transactions on Audio, Speech, and Language Processing |date=2011 |volume=19 |issue=5 |pages=1231–1241 |doi=10.1109/TASL.2010.2087755|s2cid=897622 }}</ref> and AAC.<ref name=brandenburg>{{cite web|url=http://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|title=MP3 and AAC Explained|last=Brandenburg|first=Karlheinz|year=1999|url-status=live|archive-url=https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|archive-date=2017-02-13}}</ref> MDCT was proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,<ref>{{cite book|doi=10.1109/ICASSP.1987.1169405|chapter=Subband/Transform coding using filter bank designs based on time domain aliasing cancellation|title=ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing|year=1987|last1=Princen|first1=J.|last2=Johnson|first2=A.|last3=Bradley|first3=A.|volume=12|pages=2161–2164|s2cid=58446992}}</ref> following earlier work by Princen and Bradley in 1986.<ref>{{cite journal|doi=10.1109/TASSP.1986.1164954|title=Analysis/Synthesis filter bank design based on time domain aliasing cancellation|year=1986|last1=Princen|first1=J.|last2=Bradley|first2=A.|journal=IEEE Transactions on Acoustics, Speech, and Signal Processing|volume=34|issue=5|pages=1153–1161}}</ref>

The world's first commercial [[broadcast automation]] audio compression system was developed by Oscar Bonello, an engineering professor at the [[University of Buenos Aires]].
<ref>{{cite news|title=Ricardo Sametband, La Nación Newspaper "Historia de un pionero en audio digital" |url=https://www.lanacion.com.ar/tecnologia/la-historia-de-un-pionero-del-audio-digital-nid187775|language=es}}</ref>
In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,<ref name="Zwicker"/> he started developing a practical application based on the recently developed [[IBM PC]] computer, and the broadcast automation system was launched in 1987 under the name [[Audicom]].
<ref name="Solidyne">{{cite web |title=Summary of some of Solidyne's contributions to Broadcast Engineering |url=http://www.solidynepro.com/nosotros-breve-historia/ |work=Brief History of Solidyne |publisher=Buenos Aires: Solidyne |access-date=6 March 2013  |archive-url=https://web.archive.org/web/20130308063719/http://www.solidynepro.com/indexahtmlp_Hist-ENG%2Ct.htm |archive-date=8 March 2013 }}</ref>
35 years later, almost all the radio stations in the world were using this technology manufactured by a number of companies because the inventor refused to patent his work, preferring to publish it and leave it in the public domain.
<ref>{{cite news|language=en|title=Anuncio del Audicom, AES Journal, July-August 1992, Vol 40, # 7/8, pag 647|url=http://www.aes.org/e-lib/browse.cfm?elib=19076}}<!-- auto-translated by Module:CS1 translator --></ref>

A literature compendium for a large variety of audio coding systems was published in the IEEE's ''Journal on Selected Areas in Communications'' (''JSAC''), in February 1988. While there were some papers from before that time, this collection documented an entire variety of finished, working audio coders, nearly all of them using perceptual techniques and some kind of frequency analysis and back-end noiseless coding.<ref name="Possibilities"/>