Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Truncated binary encoding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{More references|date=December 2009}} '''Truncated binary encoding''' is an [[entropy encoding]] typically used for uniform [[probability distribution]]s with a finite alphabet. It is parameterized by an alphabet with total size of number ''n''. It is a slightly more general form of [[Binary numeral system|binary encoding]] when ''n'' is not a [[power of two]]. If ''n'' is a power of two, then the coded value for 0 β€ ''x'' < ''n'' is the simple binary code for ''x'' of length log<sub>2</sub>(''n''). Otherwise let ''k'' = floor(log<sub>2</sub>(''n'')), such that 2<sup>''k''</sup> < ''n'' < 2<sup>''k''+1</sup>and let ''u'' = 2<sup>''k''+1</sup> β ''n''. Truncated binary encoding assigns the first ''u'' symbols codewords of length ''k'' and then assigns the remaining ''n'' β ''u'' symbols the last ''n'' β ''u'' codewords of length ''k'' + 1. Because all the codewords of length ''k'' + 1 consist of an unassigned codeword of length ''k'' with a "0" or "1" appended, the resulting code is a [[prefix code]]. == History == Used since at least 1984, '''phase-in codes''', also known as '''economy codes''',<ref>Eastman, Willard L, ''et al.'' (Aug. 1984) [https://patentimages.storage.googleapis.com/86/7d/56/dd49314023387d/US4464650.pdf Apparatus and Method for Compressing Data Signals and Restoring the Compressed Data Signals], US Patent 4,464,650.</ref><ref>Acharya, Tinku et JΓ‘ JΓ‘, Joseph F. (oct. 1996), [https://www.sciencedirect.com/science/article/abs/pii/0020025596000898 An on-line variable-length binary encoding of text], Information Sciences, vol 94 no 1-4, p. 1-22.</ref><ref>Job van der Zwan. [https://observablehq.com/@jobleonard/phase-in-codes "Phase-in Codes"].</ref> are also known as truncated binary encoding. ==Example with ''n'' = 5== For example, for the alphabet {0, 1, 2, 3, 4}, ''n'' = 5 and 2<sup>2</sup> β€ ''n'' < 2<sup>3</sup>, hence ''k'' = 2 and ''u'' = 2<sup>3</sup> β 5 = 3. Truncated binary encoding assigns the first ''u'' symbols the codewords 00, 01, and 10, all of length 2, then assigns the last ''n'' β ''u'' symbols the codewords 110 and 111, the last two codewords of length 3. For example, if ''n'' is 5, plain binary encoding and truncated binary encoding allocates the following [[Code word (communication)|codewords]]. Digits shown <del>struck</del> are not transmitted in truncated binary. {| class="wikitable" style="text-align:center" |- ! Truncated<br/>binary !!colspan=3| Encoding !! Standard<br/>binary |- |align=right| 0 ||bgcolor=silver| <del>0</del> || 0 || 0 ||align=left| 0 |- |align=right| 1 ||bgcolor=silver| <del>0</del> || 0 || 1 ||align=left| 1 |- |align=right| 2 ||bgcolor=silver| <del>0</del> || 1 || 0 ||align=left| 2 |- |align=right| UNUSED ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>1</del> ||align=left| 3 |- |align=right| UNUSED ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>0</del> ||align=left| 4 |- |align=right| UNUSED ||bgcolor=silver| <del>1</del> ||bgcolor=silver| <del>0</del> ||bgcolor=silver| <del>1</del> ||align=left| 5/UNUSED |- |align=right| 3 || 1 || 1 || 0 ||align=left| 6/UNUSED |- |align=right| 4 || 1 || 1 || 1 ||align=left| 7/UNUSED |} It takes 3 bits to encode ''n'' using straightforward binary encoding, hence 2<sup>3</sup> β ''n'' = 8 β 5 = 3 are unused. In numerical terms, to send a value ''x'', where 0 β€ ''x'' < ''n'', and where there are 2<sup>''k''</sup> β€ ''n'' < 2<sup>''k''+1</sup> symbols, there are ''u'' = 2<sup>''k''+1</sup> β ''n'' unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number ''x'' in truncated binary is: if ''x'' is less than ''u'', encode it in ''k'' binary bits; if ''x'' is greater than or equal to ''u'', encode the value ''x'' + ''u'' in ''k'' + 1 binary bits. ==Example with ''n'' = 10== Another example, encoding an alphabet of size 10 (between 0 and 9) requires 4 bits, but there are 2<sup>4</sup> β 10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.) {| class="wikitable" style="text-align:center" |- ! Input<br/>value !! Offset !! Offset<br/>value !! Standard<br/>binary || Truncated<br />binary |- | 0 || 0 || 0 || <del>0</del>000 || 000 |- | 1 || 0 || 1 || <del>0</del>001 || 001 |- | 2 || 0 || 2 || <del>0</del>010 || 010 |- | 3 || 0 || 3 || <del>0</del>011 || 011 |- | 4 || 0 || 4 || <del>0</del>100 || 100 |- | 5 || 0 || 5 || <del>0</del>101 || 101 |- |colspan=5| |- | 6 || 6 || 12 || 0110 || 1100 |- | 7 || 6 || 13 || 0111 || 1101 |- | 8 || 6 || 14 || 1000 || 1110 |- | 9 || 6 || 15 || 1001 || 1111 |} To decode, read the first ''k'' bits. If they encode a value less than ''u'', decoding is complete. Otherwise, read an additional bit and subtract ''u'' from the result. ==Example with ''n'' = 7== Here is a more extreme case: with ''n'' = 7 the next power of 2 is 8, so ''k'' = 2 and ''u'' = 2<sup>3</sup> β 7 = 1: {| class="wikitable" style="text-align:center" |- ! Input<br/>value !! Offset !! Offset<br />value !! Standard<br/>binary || Truncated<br />binary |- | 0 || 0 || 0 || <del>0</del>00 || 00 |- |colspan=5| |- | 1 || 1 || 2 || 001 || 010 |- | 2 || 1 || 3 || 010 || 011 |- | 3 || 1 || 4 || 011 || 100 |- | 4 || 1 || 5 || 100 || 101 |- | 5 || 1 || 6 || 101 || 110 |- | 6 || 1 || 7 || 110 || 111 |} This last example demonstrates that a leading zero bit does not always indicate a short code; if ''u'' < 2<sup>''k''</sup>, some long codes will begin with a zero bit. == Simple algorithm == Generate the truncated binary encoding for a value ''x'', 0 β€ ''x'' < ''n'', where ''n'' > 0 is the size of the alphabet containing ''x''. ''n'' need not be a power of two. <syntaxhighlight lang="C"> string TruncatedBinary (int x, int n) { // Set k = floor(log2(n)), i.e., k such that 2^k <= n < 2^(k+1). int k = 0, t = n; while (t > 1) { k++; t >>= 1; } // Set u to the number of unused codewords = 2^(k+1) - n. int u = (1 << k + 1) - n; if (x < u) return Binary(x, k); else return Binary(x + u, k + 1)); } </syntaxhighlight> The routine <code>Binary</code> is expository; usually just the rightmost <code>len</code> bits of the variable ''x'' are desired. Here we simply output the binary code for ''x'' using <code>len</code> bits, padding with high-order 0s if necessary. <syntaxhighlight lang="C"> string Binary (int x, int len) { string s = ""; while (x != 0) { if (even(x)) s = '0' + s; else s = '1' + s; x >>= 1; } while (s.Length < len) s = '0' + s; return s; } </syntaxhighlight> == On efficiency == If ''n'' is not a power of two, and ''k''-bit symbols are observed with probability ''p'', then (''k'' + 1)-bit symbols are observed with probability 1 β ''p''. We can calculate the expected number of bits per symbol <math>b_e</math> as : <math>b_e = p k + (1 - p) (k + 1).</math> Raw encoding of the symbol has <math>b_u = k + 1</math> bits. Then relative space saving ''s'' (see [[Data compression ratio]]) of the encoding can be defined as : <math>s = 1 - \frac{b_e}{b_u} = 1 - \frac{p k + (1 - p) (k + 1)}{k + 1}.</math> When simplified, this expression leads to : <math>s = \frac{p}{k + 1} = \frac{p}{b_u}.</math> This indicates that relative efficiency of truncated binary encoding increases as probability ''p'' of ''k''-bit symbols increases, and the raw-encoding symbol bit-length <math>b_u</math> decreases. ==See also== * [[Benford's law]] * [[Golomb coding]] ==References== {{Reflist}} {{DEFAULTSORT:Truncated Binary Encoding}} [[Category:Entropy coding]] [[Category:Lossless compression algorithms]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:More references
(
edit
)
Template:Reflist
(
edit
)