Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Entropy coding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Lossless data compression scheme}} {{more footnotes needed|date=December 2013}} In [[information theory]], an '''entropy coding''' (or '''entropy encoding''') is any [[lossless compression|lossless data compression]] method that attempts to approach the lower bound declared by [[Claude Shannon|Shannon's]] [[source coding theorem]], which states that any lossless data compression method must have an expected code length greater than or equal to the entropy of the source.<ref>{{Cite book |last1=Duda |first1=Jarek |last2=Tahboub |first2=Khalid |last3=Gadgil |first3=Neeraj J. |last4=Delp |first4=Edward J. |title=2015 Picture Coding Symposium (PCS) |chapter=The use of asymmetric numeral systems as an accurate replacement for Huffman coding |date=May 2015 |chapter-url=https://ieeexplore.ieee.org/document/7170048/;jsessionid=wI8UmMMsBsrAYcNN1Ux2gHW5ReVUf0HhrXFHlaZZVSmlJdKWDIv8!-2026188604 |pages=65β69 |doi=10.1109/PCS.2015.7170048|isbn=978-1-4799-7783-3 |s2cid=20260346 }}</ref> More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies <math>\operatorname E_{x\sim P}[\ell(d(x))] \geq \operatorname E_{x\sim P}[-\log_b(P(x))]</math>, where <math>\ell</math> is the function specifying the number of symbols in a code word, <math>d</math> is the coding function, <math>b</math> is the number of symbols used to make output codes and <math>P</math> is the probability of the source symbol. An entropy coding attempts to approach this lower bound. Two of the most common entropy coding techniques are [[Huffman coding]] and [[arithmetic coding]].<ref name="Huffman 1952 pp. 1098β1101">{{cite journal |last=Huffman |first=David |title=A Method for the Construction of Minimum-Redundancy Codes |journal=Proceedings of the IRE |publisher=Institute of Electrical and Electronics Engineers (IEEE) |volume=40 |issue=9 |year=1952 |issn=0096-8390 |doi=10.1109/jrproc.1952.273898 |pages=1098β1101}}</ref> If the approximate entropy characteristics of a data stream are known in advance (especially for [[signal compression]]), a simpler static code may be useful. These static codes include [[Universal code (data compression)|universal codes]] (such as [[Elias gamma coding]] or [[Fibonacci coding]]) and [[Golomb coding|Golomb codes]] (such as [[unary coding]] or [[Rice coding]]). Since 2014, data compressors have started using the [[asymmetric numeral systems]] family of entropy coding techniques, which allows combination of the compression ratio of [[arithmetic coding]] with a processing cost similar to [[Huffman coding]]. == Entropy as a measure of similarity == Besides using entropy coding as a way to compress digital data, an entropy encoder can also be used to measure the amount of [[Similarity measure|similarity]] between [[data stream|streams of data]] and already existing classes of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then [[Statistical classification|classified]] by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data. == See also == * [[Arithmetic coding]] * [[Asymmetric numeral systems]] (ANS) * [[Context-adaptive binary arithmetic coding]] (CABAC) * [[Huffman coding]] * [[Range coding]] ==References== {{reflist}} == External links == * ''[http://www.inference.phy.cam.ac.uk/mackay/itila/book.html Information Theory, Inference, and Learning Algorithms]'', by [[David MacKay (scientist)|David MacKay]] (2003), gives an introduction to Shannon theory and data compression, including the [[Huffman coding]] and [[arithmetic coding]]. * ''[http://iphome.hhi.de/wiegand/assets/pdfs/VBpart1.pdf Source Coding],'' by [[Thomas Wiegand|T. Wiegand]] and H. Schwarz (2011). {{Compression Methods}} {{Authority control}} [[Category:Entropy coding]] [[Category:Entropy and information]] [[Category:Data compression]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Authority control
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Compression Methods
(
edit
)
Template:More footnotes needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)