Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Run-length encoding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Form of lossless data compression}} {{Distinguish | run-length limited}} '''Run-length encoding''' ('''RLE''') is a form of [[lossless data compression]] in which ''runs'' of data (consecutive occurrences of the same data value) are stored as a single occurrence of that data value and a count of its consecutive occurrences, rather than as the original run. As an imaginary example of the concept, when encoding an image built up from colored dots, the sequence "green green green green green green green green green" is shortened to "green x 9". This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings, games, and animations. For files that do not have many runs, encoding them with RLE could increase the file size. RLE may also refer in particular to an early graphics file format supported by [[CompuServe]] for compressing black and white images, that was widely supplanted by their later [[Graphics Interchange Format]] (GIF). RLE also refers to a little-used image format in [[Windows 3.x]] that is saved with the file extension <code>rle</code>; it is a run-length encoded bitmap, and was used as the format for the Windows 3.x startup screen. ==History and applications== Run-length encoding (RLE) schemes were employed in the transmission of analog television signals as far back as 1967.<ref name="robinson" /> In 1983, run-length encoding was [[patent]]ed by [[Hitachi]].<ref>{{cite web |date=21 March 1996 |title=Run Length Encoding Patents |url=http://www.ross.net/compression/patents_notes_from_ccfaq.html |access-date=14 July 2019 |publisher=Internet FAQ Consortium}}</ref><ref>{{cite web |date=7 August 1984 |title=Method and system for data compression and restoration |url=https://patents.google.com/patent/US4586027A |access-date=14 July 2019 |website=[[Google Patents]]}}</ref><ref>{{cite web |date=8 August 1983 |title=Data recording method |url=https://patents.google.com/patent/JPH0828053B2/en |access-date=14 July 2019 |website=[[Google Patents]]}}</ref> RLE is particularly well suited to [[Palette (computing)|palette]]-based bitmap images (which use relatively few colours) such as [[computer icons]], and was a popular image compression method on early [[online service]]s such as [[CompuServe]] before the advent of more sophisticated formats such as [[GIF]].<ref name="transactor" /> It does not work well on continuous-tone images (which use very many colours) such as photographs, although [[JPEG]] uses it on the coefficients that remain after transforming and [[Quantization (image processing)|quantizing]] image blocks. Common formats for run-length encoded data include [[Truevision TGA]], [[PackBits]] (by Apple, used in [[MacPaint]]), [[PCX]] and [[ILBM]]. The [[International Telecommunication Union]] also describes a standard to encode run-length colour for [[fax]] machines, known as T.45.<ref name="itu" /> That fax colour coding standard, which along with other techniques is incorporated into [[Modified Huffman coding]],{{citation needed|date=December 2015}} is relatively efficient because most faxed documents are primarily white space, with occasional interruptions of black. == Algorithm == RLE has a space complexity of {{tmath|O(n)}}, where {{mvar|n}} is the size of the input data. === Encoding algorithm === Run-length encoding compresses data by reducing the physical size of a repeating string of characters. This process involves converting the input data into a compressed format by identifying and counting consecutive occurrences of each character. The steps are as follows: # Traverse the input data. # Count the number of consecutive repeating characters (run length). # Store the character and its run length. ==== Python implementation ==== {{hidden|headerstyle=background:#ccccff; text-align:left; |Imports and helper functions |<syntaxhighlight lang="python"> from itertools import repeat, compress, groupby def ilen(iterable): """ Return the number of items in iterable. >>> ilen(x for x in range(1000000) if x % 3 == 0) 333334 """ # using zip() to wrap the input with 1-tuples which compress() reads as true values. return sum(compress(repeat(1), zip(iterable))) </syntaxhighlight> }} <syntaxhighlight lang="python"> def rle_encode(iterable, *, length_first=True): """ >>> "".join(rle_encode("AAAABBBCCDAA")) '4A3B2C1D2A' >>> "".join(rle_encode("AAAABBBCCDAA", length_first=False)) 'A4B3C2D1A2' """ return ( f"{ilen(g)}{k}" if length_first else f"{k}{ilen(g)}" # ilen(g): length of iterable g for k, g in groupby(iterable) ) </syntaxhighlight><ref name="more-itertools">{{cite web|url=https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more.html#run_length|date=August 2024|title=more-itertools 10.4.0 documentation}}</ref> === Decoding algorithm === The decoding process involves reconstructing the original data from the encoded format by repeating characters according to their counts. The steps are as follows: # Traverse the encoded data. # For each count-character pair, repeat the character count times. # Append these characters to the result string. ==== Python implementation ==== {{hidden|headerstyle=background:#ccccff; text-align:left; |Imports |<syntaxhighlight lang="python"> from itertools import chain, repeat, batched </syntaxhighlight> }} <syntaxhighlight lang="python"> def rle_decode(iterable, *, length_first=True): """ >>> "".join(rle_decode("4A3B2C1D2A")) 'AAAABBBCCDAA' >>> "".join(rle_decode("A4B3C2D1A2", length_first=False)) 'AAAABBBCCDAA' """ return chain.from_iterable( repeat(b, int(a)) if length_first else repeat(a, int(b)) for a, b in batched(iterable, 2) ) </syntaxhighlight><ref name="more-itertools"/> ==Example== Consider a screen containing plain black text on a solid white background. There will be many long runs of white [[pixel]]s in the blank space, and many short runs of black pixels within the text. A hypothetical [[scan line]], with B representing a black pixel and W representing white, might read as follows: : <code> WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW </code> With a run-length encoding (RLE) data compression algorithm applied to the above hypothetical scan line, it can be rendered as follows: : <code> 12W1B12W3B24W1B14W </code> This can be interpreted as a sequence of twelve Ws, one B, twelve Ws, three Bs, etc., and represents the original 67 characters in only 18. While the actual format used for the storage of images is generally binary rather than [[ASCII]] characters like this, the principle remains the same. Even binary data files can be compressed with this method; file format specifications often dictate repeated bytes in files as padding space. However, newer compression methods such as [[DEFLATE]] often use [[LZ77]]-based algorithms, a generalization of run-length encoding that can take advantage of runs of strings of characters (such as <code>BWWBWWBWWBWW</code>). Run-length encoding can be expressed in multiple ways to accommodate data properties as well as additional compression algorithms. For instance, one popular method encodes run lengths for runs of two or more characters only, using an "escape" symbol to identify runs, or using the character itself as the escape, so that any time a character appears twice it denotes a run. On the previous example, this would give the following: : <code>WW12BWW12BB3WW24BWW14</code> This would be interpreted as a run of twelve Ws, a B, a run of twelve Ws, a run of three Bs, etc. In data where runs are less frequent, this can significantly improve the compression rate. One other matter is the application of additional compression algorithms. Even with the runs extracted, the frequencies of different characters may be large, allowing for further compression; however, if the run lengths are written in the file in the locations where the runs occurred, the presence of these numbers interrupts the normal flow and makes it harder to compress. To overcome this, some run-length encoders separate the data and escape symbols from the run lengths, so that the two can be handled independently. For the example data, this would result in two outputs, the string "<code>WWBWWBBWWBWW</code>" and the numbers (<code>12,12,3,24,14</code>). == Variants == * Sequential RLE: This method processes data one line at a time, scanning from left to right. It is commonly employed in image compression. Other variations of this technique include scanning the data vertically, diagonally, or in blocks. * Lossy RLE: In this variation, some bits are intentionally discarded during compression (often by setting one or two significant bits of each pixel to 0). This leads to higher compression rates while minimally impacting the visual quality of the image. *Adaptive RLE: Uses different encoding schemes depending on the length of runs to optimize compression ratios. For example, short runs might use a different encoding format than long runs. == See also == * [[Kolakoski sequence]] * [[Look-and-say sequence]] * [[Comparison of graphics file formats]] * [[Golomb coding]] * [[Burrows–Wheeler transform]] * [[Recursive indexing]] * [[Run-length limited]] * [[Bitmap index]] * [[Forsyth–Edwards Notation]], which uses run-length-encoding for empty spaces in chess positions. * [[DEFLATE]] * [[Convolution]] * [[Huffman coding]] * [[Arithmetic coding]] == References == {{reflist|refs= <ref name="robinson">{{cite journal |author1-last=Robinson |author1-first=A. H. |author2-last=Cherry |author2-first=C. |title=Results of a prototype television bandwidth compression scheme |journal=[[Proceedings of the IEEE]] |publisher=[[IEEE]] |volume=55 |number=3 |date=1967 |pages=356–364 |doi=10.1109/PROC.1967.5493}}</ref> <ref name="transactor">{{cite journal |author-link=Christopher Dunn (computer enthusiast) |journal=[[The Transactor]] |date=1987 |volume=7 |number=6 |publisher=[[Transactor Publishing]] |author-last=Dunn |author-first=Christopher |title=Smile! You're on RLE! |pages=16–18 |url=http://csbruce.com/cbm/transactor/pdfs/trans_v7_i06.pdf |access-date=2015-12-06}}</ref> <ref name="itu">{{cite book |title=Recommendation T.45 (02/00): Run-length colour encoding |date=2000 |publisher=[[International Telecommunication Union]] |url=http://www.itu.int/rec/T-REC-T.45 |access-date=2015-12-06}}</ref> }} == External links == * [http://rosettacode.org/wiki/Run-length_encoding Run-length encoding implemented in different programming languages] (on [[Rosetta Code]]) * [https://gitlab.com/bztsrc/rle Single Header Run-Length Encoding Library] smallest possible implementation (about 20 SLoC) in ANSI C. FOSS, compatible with [[Truevision TGA]], supports 8, 16, 24 and 32 bit elements too. {{Compression Methods}} {{Compression formats}} [[Category:Lossless compression algorithms]] [[Category:Data compression]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation needed
(
edit
)
Template:Cite web
(
edit
)
Template:Compression Methods
(
edit
)
Template:Compression formats
(
edit
)
Template:Distinguish
(
edit
)
Template:Hidden
(
edit
)
Template:Mvar
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Tmath
(
edit
)