Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Entropy (information theory)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Data compression=== {{Main|Shannon's source coding theorem|Data compression}} Shannon's definition of entropy, when applied to an information source, can determine the minimum channel capacity required to reliably transmit the source as encoded binary digits. Shannon's entropy measures the information contained in a message as opposed to the portion of the message that is determined (or predictable). Examples of the latter include redundancy in language structure or statistical properties relating to the occurrence frequencies of letter or word pairs, triplets etc. The minimum channel capacity can be realized in theory by using the [[typical set]] or in practice using [[Huffman coding|Huffman]], [[LZW|Lempel–Ziv]] or [[arithmetic coding]]. (See also [[Kolmogorov complexity]].) In practice, compression algorithms deliberately include some judicious redundancy in the form of [[checksum]]s to protect against errors. The [[entropy rate]] of a data source is the average number of bits per symbol needed to encode it. Shannon's experiments with human predictors show an information rate between 0.6 and 1.3 bits per character in English;<ref>{{cite web |url=http://marknelson.us/2006/08/24/the-hutter-prize/ |title=The Hutter Prize |access-date=2008-11-27 |date=24 August 2006 |author=Mark Nelson |archive-date=1 March 2018 |archive-url=https://web.archive.org/web/20180301161215/http://marknelson.us/2006/08/24/the-hutter-prize/ |url-status=dead }}</ref> the [[PPM compression algorithm]] can achieve a compression ratio of 1.5 bits per character in English text. If a [[Data compression|compression]] scheme is lossless – one in which you can always recover the entire original message by decompression – then a compressed message has the same quantity of information as the original but is communicated in fewer characters. It has more information (higher entropy) per character. A compressed message has less [[redundancy (information theory)|redundancy]]. [[Shannon's source coding theorem]] states a lossless compression scheme cannot compress messages, on average, to have ''more'' than one bit of information per bit of message, but that any value ''less'' than one bit of information per bit of message can be attained by employing a suitable coding scheme. The entropy of a message per bit multiplied by the length of that message is a measure of how much total information the message contains. Shannon's theorem also implies that no lossless compression scheme can shorten ''all'' messages. If some messages come out shorter, at least one must come out longer due to the [[pigeonhole principle]]. In practical use, this is generally not a problem, because one is usually only interested in compressing certain types of messages, such as a document in English, as opposed to gibberish text, or digital photographs rather than noise, and it is unimportant if a compression algorithm makes some unlikely or uninteresting sequences larger. A 2011 study in ''[[Science (journal)|Science]]'' estimates the world's technological capacity to store and communicate optimally compressed information normalized on the most effective compression algorithms available in the year 2007, therefore estimating the entropy of the technologically available sources.<ref name="HilbertLopez2011">[http://www.sciencemag.org/content/332/6025/60 "The World's Technological Capacity to Store, Communicate, and Compute Information"] {{Webarchive|url=https://web.archive.org/web/20130727161911/http://www.sciencemag.org/content/332/6025/60 |date=27 July 2013 }}, Martin Hilbert and Priscila López (2011), ''[[Science (journal)|Science]]'', 332(6025); free access to the article through here: martinhilbert.net/WorldInfoCapacity.html</ref>{{rp|pp=60–65}} {| class="wikitable" |+ All figures in entropically compressed [[exabytes]] |- ! Type of Information !! 1986 !! 2007 |- | Storage || 2.6 || 295 |- | Broadcast || 432 || 1900 |- | Telecommunications || 0.281 || 65 |} The authors estimate humankind technological capacity to store information (fully entropically compressed) in 1986 and again in 2007. They break the information into three categories—to store information on a medium, to receive information through one-way [[broadcast]] networks, or to exchange information through two-way [[telecommunications network]]s.<ref name="HilbertLopez2011"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)