Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Deflate
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Bit reduction === {{Further|Huffman coding}} The second compression stage consists of replacing commonly used symbols with shorter representations and less commonly used symbols with longer representations. The method used is [[Huffman coding]] which creates an unprefixed tree of non-overlapping intervals, where the length of each sequence is inversely proportional to the logarithm of the probability of that symbol needing to be encoded. The more likely it is that a symbol has to be encoded, the shorter its bit-sequence will be. A tree is created, containing space for 288 symbols: * 0β255: represent the literal bytes/symbols 0β255. * 256: end of block β stop processing if last block, otherwise start processing next block. * 257β285: combined with extra-bits, a match length of 3β258 bytes. * 286, 287: not used, reserved and illegal but still part of the tree. A match length code will always be followed by a distance code. Based on the distance code read, further "extra" bits may be read in order to produce the final distance. The distance tree contains space for 32 symbols: * 0β3: distances 1β4 * 4β5: distances 5β8, 1 extra bit * 6β7: distances 9β16, 2 extra bits * 8β9: distances 17β32, 3 extra bits * ... * 26β27: distances 8,193β16,384, 12 extra bits * 28β29: distances 16,385β32,768, 13 extra bits * 30β31: not used, reserved and illegal but still part of the tree For the match distance symbols 2β29, the number of extra bits can be calculated as <math>\left\lfloor\frac{n}{2}\right\rfloor-1</math>. The two codes (the 288-symbol length/literal tree and the 32-symbol distance tree) are themselves encoded as [[canonical Huffman code]]s by giving the bit length of the code for each symbol. The bit lengths are themselves [[Run-length encoding|run-length encoded]] to produce as compact a representation as possible. As an alternative to including the tree representation, the "static tree" option provides standard fixed Huffman trees. The compressed size using the static trees can be computed using the same statistics (the number of times each symbol appears) as are used to generate the dynamic trees, so it is easy for a compressor to choose whichever is smaller.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)