Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Prefix code
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Techniques== If every word in the code has the same length, the code is called a '''fixed-length code''', or a '''block code''' (though the term [[block code]] is also used for fixed-size [[error-correcting code]]s in [[channel coding]]). For example, [[ISO 8859-15]] letters are always 8 bits long. [[UTF-32/UCS-4]] letters are always 32 bits long. [[Asynchronous Transfer Mode|ATM cells]] are always 424 bits (53 bytes) long. A fixed-length code of fixed length ''k'' bits can encode up to <math>2^{k}</math> source symbols. A fixed-length code is necessarily a prefix code. It is possible to turn any code into a fixed-length code by padding fixed symbols to the shorter prefixes in order to meet the length of the longest prefixes. Alternately, such padding codes may be employed to introduce redundancy that allows autocorrection and/or synchronisation. However, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others. [[Truncated binary encoding]] is a straightforward generalization of fixed-length codes to deal with cases where the number of symbols ''n'' is not a power of two. Source symbols are assigned codewords of length ''k'' and ''k''+1, where ''k'' is chosen so that ''2<sup>k</sup> < n β€ 2<sup>k+1</sup>''. [[Huffman coding]] is a more sophisticated technique for constructing variable-length prefix codes. The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths. (This is closely related to minimizing the entropy.) This is a form of [[lossless data compression]] based on [[entropy encoding]]. Some codes mark the end of a code word with a special "comma" symbol (also called a [[Sentinel value]]), different from normal data.<ref>{{cite web |url=http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf |title=Development of Trigger and Control Systems for CMS |first1=J. |last1=A. Jones |page=70 |publisher=High Energy Physics, Blackett Laboratory, Imperial College, London |url-status=dead |archive-url= https://web.archive.org/web/20110613183447/http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf |archive-date= Jun 13, 2011 }}</ref> This is somewhat analogous to the spaces between words in a sentence; they mark where one word ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is automatically prefix-free. However, reserving an entire symbol only for use as a comma can be inefficient, especially for languages with a small number of symbols. [[Morse code]] is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, [[Fibonacci coding]] uses a "11" to mark the end of every code word. [[Self-synchronizing code]]s are prefix codes that allow [[frame synchronization]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)