Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
LZMA
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== LZMA coding contexts === The LZMA packet format has already been described, and this section specifies how LZMA statistically models the LZ-encoded streams, or in other words which probability variables are passed to the range decoder to decode each bit. Those probability variables are implemented as multi-dimensional arrays; before introducing them, a few values that are used as indices in these multidimensional arrays are defined. The {{mono|''state''}} value is conceptually based on which of the patterns in the following table match the latest 2β4 packet types seen, and is implemented as a state machine state updated according to the transition table listed in the table every time a packet is output. The initial state is 0, and thus packets before the beginning are assumed to be LIT packets. {| class="wikitable" border="1" |- ! state ! colspan="4" | previous packets ! colspan="4" | next state when next packet is |- ! ! 4th previous ! 3rd previous ! 2nd previous ! previous ! LIT ! MATCH ! LONGREP[*] ! SHORTREP |- | 0 | | LIT | LIT | LIT | 0 | 7 | 8 | 9 |- | 1 | | MATCH | LIT | LIT | 0 | 7 | 8 | 9 |- | rowspan="2" | 2 | | LONGREP[*] | rowspan="2" | LIT | rowspan="2" | LIT | rowspan="2" | 0 | rowspan="2" | 7 | rowspan="2" | 8 | rowspan="2" | 9 |- | *MATCH | SHORTREP |- | 3 | LIT | SHORTREP | LIT | LIT | 0 | 7 | 8 | 9 |- | 4 | | | MATCH | LIT | 1 | 7 | 8 | 9 |- | rowspan="2" | 5 | rowspan="2" | | | LONGREP[*] | rowspan="2" | LIT | rowspan="2" | 2 | rowspan="2" | 7 | rowspan="2" | 8 | rowspan="2" | 9 |- | *MATCH | SHORTREP |- | 6 | | LIT | SHORTREP | LIT | 3 | 7 | 8 | 9 |- | 7 | | | LIT | MATCH | 4 | 10 | 11 | 11 |- | 8 | | | LIT | LONGREP[*] | 5 | 10 | 11 | 11 |- | 9 | | | LIT | SHORTREP | 6 | 10 | 11 | 11 |- | 10 | | | *MATCH | MATCH | 4 | 10 | 11 | 11 |- | 11 | | | *MATCH | *REP | 5 | 10 | 11 | 11 |} The {{mono|''pos_state''}} and {{mono|''literal_pos_state''}} values consist of respectively the {{mono|''pb''}} and {{mono|''lp''}} (up to 4, from the LZMA header or LZMA2 properties packet) least significant bits of the dictionary position (the number of bytes coded since the last dictionary reset modulo the dictionary size). Note that the dictionary size is normally the multiple of a large power of 2, so these values are equivalently described as the least significant bits of the number of uncompressed bytes seen since the last dictionary reset. The {{mono|''prev_byte_lc_msbs''}} value is set to the {{mono|''lc''}} (up to 4, from the LZMA header or LZMA2 properties packet) most significant bits of the previous uncompressed byte. The {{mono|''is_REP''}} value denotes whether a packet that includes a length is a LONGREP rather than a MATCH. The {{mono|''match_byte''}} value is the byte that would have been decoded if a SHORTREP packet had been used (in other words, the byte found at the dictionary at the last used distance); it is only used just after a *MATCH packet. {{mono|''literal_bit_mode''}} is an array of 8 values in the 0β2 range, one for each bit position in a byte, which are 1 or 2 if the previous packet was a *MATCH and it is either the most significant bit position or all the more significant bits in the literal to encode/decode are equal to the bits in the corresponding positions in {{mono|''match_byte''}}, while otherwise it is 0; the choice between the 1 or 2 values depends on the value of the bit at the same position in {{mono|''match_byte''}}. The literal/Literal set of variables can be seen as a "pseudo-bit-tree" similar to a bit-tree but with 3 variables instead of 1 in every node, chosen depending on the {{mono|''literal_bit_mode''}} value at the bit position of the next bit to decode after the bit-tree context denoted by the node. The claim, found in some sources, that literals after a *MATCH are coded as the XOR of the byte value with {{mono|''match_byte''}} is incorrect; they are instead coded simply as their byte value, but using the pseudo-bit-tree just described and the additional context listed in the table below. The probability variable groups used in LZMA are those: {| class="wikitable" border="1" |- ! XZ name ! LZMA SDK name ! Parameterized by ! Used when ! Coding mode ! If bit 0 then ! If bit 1 then |- | {{mono|is_match}} | {{mono|IsMatch}} | {{mono|''state''}}, {{mono|''pos_state''}} | packet start | bit | LIT | *MATCH |- | {{mono|is_rep}} | {{mono|IsRep}} | {{mono|''state''}} | after bit sequence 1 | bit | MATCH | *REP |- | {{mono|is_rep0}} | {{mono|IsRepG0}} | {{mono|''state''}} | after bit sequence 11 | bit | SHORTREP/{{nowrap|LONGREP[0]}} | {{nowrap|LONGREP[1β3]}} |- | {{mono|is_rep0_long}} | {{mono|IsRep0Long}} | {{mono|''state''}}, {{mono|''pos_state''}} | after bit sequence 110 | bit | SHORTREP | LONGREP[0] |- | {{mono|is_rep1}} | {{mono|IsRepG1}} | {{mono|''state''}} | after bit sequence 111 | bit | LONGREP[1] | LONGREP[2/3] |- | {{mono|is_rep2}} | {{mono|IsRepG2}} | {{mono|''state''}} | after bit sequence 1111 | bit | LONGREP[2] | LONGREP[3] |- | {{mono|literal}} | {{mono|Literal}} | {{mono|''prev_byte_lc_msbs''}}, {{mono|''literal_pos_state''}}, {{nowrap|<code>''literal_bit_mode''[bit position]</code>}}, bit-tree context | after bit sequence 0 | 256 values pseudo-bit-tree | colspan="2" | literal byte value |- | {{mono|dist_slot}} | {{mono|PosSlot}} | <code>min(''match_length'', 5)</code>, bit-tree context | distance: start | 64 values bit-tree | colspan="2" | distance slot |- | {{mono|dist_special}} | {{mono|SpecPos}} | {{mono|''distance_slot''}}, reverse bit-tree context | distance: 4β13 distance slots | {{nowrap|<code>((''distance_slot'' >> 1) β 1)</code>}}-bit reverse bit-tree | colspan="2" | low bits of distance |- | {{mono|dist_align}} | {{mono|Align}} | reverse bit-tree context | distance: 14+ distance slots, after fixed probability bits | 4-bit reverse bit-tree | colspan="2" | low bits of distance |- | {{mono|len_dec.choice}} | {{mono|LenChoice}} | {{mono|''is_REP''}} | match length: start | bit | 2β9 length | 10+ length |- | {{mono|len_dec.choice2}} | {{mono|LenChoice2}} | {{mono|''is_REP''}} | match length: after bit sequence 1 | bit | 10β17 length | 18+ length |- | {{mono|len_dec.low}} | {{mono|LenLow}} | {{mono|''is_REP''}}, {{mono|''pos_state''}}, bit-tree context | match length: after bit sequence 0 | 8 values bit-tree | colspan="2" | low bits of length |- | {{mono|len_dec.mid}} | {{mono|LenMid}} | {{mono|''is_REP''}}, {{mono|''pos_state''}}, bit-tree context | match length: after bit sequence 10 | 8 values bit-tree | colspan="2" | middle bits of length |- | {{mono|len_dec.high}} | {{mono|LenHigh}} | {{mono|''is_REP''}}, bit-tree context | match length: after bit sequence 11 | 256 values bit-tree | colspan="2" | high bits of length |}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)