Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Data compression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== Lossy audio compression ==== [[File:AudiodatenkompressionManowarThePowerOfThySword.jpg|thumb|Comparison of [[spectrogram]]s of audio in an uncompressed format and several lossy formats. The lossy spectrograms show [[bandlimiting]] of higher frequencies, a common technique associated with lossy audio compression.]] Lossy audio compression is used in a wide range of applications. In addition to standalone audio-only applications of file playback in MP3 players or computers, digitally compressed audio streams are used in most video DVDs, digital television, streaming media on the [[Internet]], satellite and cable radio, and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression, by discarding less-critical data based on [[psychoacoustic]] optimizations.<ref name="Jaiswal"/> Psychoacoustics recognizes that not all data in an audio stream can be perceived by the human [[auditory system]]. Most lossy compression reduces redundancy by first identifying perceptually irrelevant sounds, that is, sounds that are very hard to hear. Typical examples include high frequencies or sounds that occur at the same time as louder sounds. Those irrelevant sounds are coded with decreased accuracy or not at all. Due to the nature of lossy algorithms, [[audio quality]] suffers a [[digital generation loss]] when a file is decompressed and recompressed. This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, lossy formats such as [[MP3]] are very popular with end-users as the file size is reduced to 5-20% of the original size and a megabyte can store about a minute's worth of music at adequate quality. Several proprietary lossy compression algorithms have been developed that provide higher quality audio performance by using a combination of lossless and lossy algorithms with adaptive bit rates and lower compression ratios. Examples include [[aptX]], [[LDAC (codec)|LDAC]], [[LHDC (codec)|LHDC]], [[Master Quality Authenticated#Codec description|MQA]] and [[SCL6 (codec)|SCL6]]. ===== Coding methods ===== To determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the [[modified discrete cosine transform]] (MDCT) to convert [[time domain]] sampled waveforms into a transform domain, typically the [[frequency domain]]. Once transformed, component frequencies can be prioritized according to how audible they are. Audibility of spectral components is assessed using the [[absolute threshold of hearing]] and the principles of [[simultaneous masking]]—the phenomenon wherein a signal is masked by another signal separated by frequency—and, in some cases, [[temporal masking]]—where a signal is masked by another signal separated by time. [[Equal-loudness contour]]s may also be used to weigh the perceptual importance of components. Models of the human ear-brain combination incorporating such effects are often called [[psychoacoustic model]]s.<ref name="faxin47"/> Other types of lossy compressors, such as the [[linear predictive coding]] (LPC) used with speech, are source-based coders. LPC uses a model of the human vocal tract to analyze speech sounds and infer the parameters used by the model to produce them moment to moment. These changing parameters are transmitted or stored and used to drive another model in the decoder which reproduces the sound. Lossy formats are often used for the distribution of streaming audio or interactive communication (such as in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications.<ref name="Jaiswal"/> [[Latency (engineering)|Latency]] is introduced by the methods used to encode and decode the data. Some codecs will analyze a longer segment, called a ''frame'', of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time to decode. The inherent latency of the coding algorithm can be critical; for example, when there is a two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality. In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples that must be analyzed before a block of audio is processed. In the minimum case, latency is zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms. ===== Speech encoding ===== [[Speech encoding]] is an important category of audio data compression. The perceptual models used to estimate what aspects of speech a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice is normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using a relatively low bit rate. This is accomplished, in general, by some combination of two approaches: * Only encoding sounds that could be made by a single human voice. * Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human [[hearing]]. The earliest algorithms used in speech encoding (and audio data compression in general) were the [[A-law algorithm]] and the [[μ-law algorithm]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)