Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Incremental encoding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
'''Incremental encoding''', also known as '''front compression''', '''back compression''', or '''front coding''', is a type of [[delta encoding]] [[compression algorithm]] whereby common [[Prefix (linguistics)|prefix]]es or [[Affix|suffixes]] and their lengths are recorded so that they need not be duplicated. This algorithm is particularly well-suited for compressing [[sorting|sorted]] [[data#Uses of data in computing|data]], e.g., a list of [[word]]s from a [[dictionary]]. For example: {| class="wikitable" |- ! Input ! Common prefix ! Compressed output |- | <pre> myxa myxophyta myxopod nab nabbed nabbing nabit nabk nabob nacarat nacelle </pre> | <pre> no preceding word 'myx' 'myxop' no common prefix 'nab' 'nabb' 'nab' 'nab' 'nab' 'na' 'nac' </pre> | <pre> 0 myxa 3 ophyta 5 od 0 nab 3 bed 4 ing 3 it 3 k 3 ob 2 carat 3 elle </pre> |- | align="right" | 64 bytes | | align="left" | 46 bytes |- |} The encoding used to store the common prefix length itself varies from application to application. Typical techniques are storing the value as a single byte; [[delta encoding]], which stores only the change in the common prefix length; and various [[Universal code (data compression)|universal codes]]. It may be combined with other general [[lossless data compression]] techniques such as [[entropy encoding]] and [[dictionary coder]]s to compress the remaining suffixes. == Applications == Incremental encoding is widely used in information retrieval to compress the lexicons used in [[Index (search engine)|search indexes]]; these list all the words found in all the documents and a pointer for each one to a list of locations. Typically, it compresses these indexes by about 40%.<ref>Ian H. Witten, Alistair Moffat, Timothy C. Bell. Managing Gigabytes. Second edition. Academic Press. {{ISBN|1-55860-570-3}}. Section 4.1: Accessing the lexicon, subsection Front coding, pp.159–161.</ref> As one example, incremental encoding is used as a starting point by the [[GNU locate]] utility, in an index of filenames and directories. The [[GNU locate]] utility further uses [[bigram]] encoding to further shorten popular filepath prefixes. == References == <references /> {{Compression methods}} [[Category:Lossless compression algorithms]] [[Category:Database index techniques]] [[Category:Data compression]] {{storage-software-stub}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Compression methods
(
edit
)
Template:ISBN
(
edit
)
Template:Storage-software-stub
(
edit
)