Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Uuencoding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Form of binary-to-text encoding}} {{distinguish|text=[[Percent-encoding|URL encoding]]}} '''uuencoding''' is a form of [[binary-to-text encoding]] that originated in the [[Unix]] programs '''uuencode''' and '''uudecode''' written by [[Mary Ann Horton]] at the [[University of California, Berkeley]] in 1980,<ref>{{Cite web |last=Horton |first=Mark |title=UUENCODE(1C) UNIX Programmer's Manual |url=https://www.tuhs.org/cgi-bin/utree.pl?file=4BSD/usr/man/cat1/uuencode.1c |access-date=2020-11-10 |website=The Unix Heritage Society |language=en-US}}</ref> for [[code|encoding]] [[Binary numeral system|binary]] data for transmission in [[email]] systems. The name "uuencoding" is derived from [[Unix-to-Unix Copy]], i.e. "Unix-to-Unix encoding" is a safe encoding for the transfer of arbitrary files from one Unix system to another Unix system but without guarantee that the intervening links would all be Unix systems. Since an email message might be forwarded through or to computers with different [[character set]]s or through transports which are not [[8-bit clean]], or handled by programs that are not 8-bit clean, forwarding a binary file via email might cause it to be corrupted. By encoding such data into a character subset common to most character sets, the encoded form of such data files was unlikely to be "translated" or corrupted, and would thus arrive intact and unchanged at the destination. The program '''uudecode''' reverses the effect of '''uuencode''', recreating the original binary file exactly. uuencode/decode became popular for sending binary (and especially compressed) files by email and posting to [[Usenet]] newsgroups, etc. It has now been largely replaced by [[MIME]] and [[yEnc]]. With MIME, files that might have been uuencoded are instead transferred with [[Base64]] encoding. == Encoded format == A uuencoded file starts with a header line of the form: begin <mode> <file><newline> <code><mode></code> is the file's [[File permissions#Numeric notation|Unix file permissions]] as three octal digits (e.g. 644, 744). This is typically only significant to [[Unix-like]] operating systems. <code><file></code> is the file name to be used when recreating the binary data. <code><newline></code> signifies a [[newline]] character, used to terminate each line. Each data line uses the format: <length character><formatted characters><newline> <code><length character></code> is a character indicating the number of data bytes which have been encoded on that line. This is an [[ASCII]] character determined by adding 32 to the actual byte count, with the sole exception of a grave accent "`" (ASCII code 96) signifying zero bytes. All data lines, except the last (if the data length was not divisible by 45), have 45 bytes of encoded data (60 characters after encoding). Therefore, the vast majority of length values are 'M', (32 + 45 = ASCII code 77 or "M"). <code><formatted characters></code> are encoded characters. See {{section link||Formatting mechanism}} for more details on the actual implementation. The file ends with two lines: `<newline> end<newline> The second to last line is also a character indicating the line length, with the grave accent signifying zero bytes. As a complete file, the uuencoded output for a plain text file named cat.txt containing only the characters ''Cat'' would be begin 644 cat.txt #0V%T ` end The begin line is a standard uuencode header; the '#' indicates that its line encodes three characters; the last two lines appear at the end of all uuencoded files. == Formatting mechanism == The mechanism of <code>uuencoding</code> repeats the following for every 3 bytes, encoding them into 4 printable characters, each character representing a [[radix-64]] [[numerical digit]]: # Start with 3 [[bytes]] from the source, 24 [[bit]]s in total. # Split into 4 6-bit groupings, each representing a value in the range 0 to 63: bits (00-05), (06-11), (12-17) and (18-23). # Add 32 to each of the values. With the addition of 32 this means that the possible results can be between 32 (" " space) and 95 ("_" [[underline]]). 96 ("`" [[grave accent]]) as the "special character" is a logical extension of this range. Despite space character being documented as the encoding for value of 0, implementations, such as GNU sharutils,<ref>{{Cite web |title=uuencode.c source |url=https://fossies.org/dox/sharutils-4.15.2/uuencode_8c_source.html#l00085 |access-date=2021-06-05 |website=fossies.org |language=en-US}}</ref> actually use the grave accent character to encode zeros in the body of the file as well, never using space. # Output the ASCII equivalent of these numbers. If the source length is not divisible by 3, then the last 4-byte section will contain padding bytes to make it cleanly divisible. These bytes are subtracted from the line's <code><length character></code> so that the decoder does not append unwanted characters to the file. <code>uudecoding</code> is reverse of the above, subtract 32 from each character's ASCII code ([[modulo]] 64 to account for the grave accent usage) to get a 6-bit value, concatenate 4 6-bit groups to get 24 bits, then output 3 bytes. The encoding process is demonstrated by this table, which shows the derivation of the above encoding for "Cat". {| class="wikitable" | style="border-right: 2px solid black" | Original characters ! colspan=8 style="border-right: 2px solid green" | <code>C</code> ! colspan=8 style="border-right: 2px solid green" | <code>a</code> ! colspan=8 style="border-right: 2px solid black" | <code>t</code> |- | style="border-right: 2px solid black" | Original ASCII, decimal ! colspan=8 style="border-right: 2px solid green" | 67 ! colspan=8 style="border-right: 2px solid green" | 97 ! colspan=8 style="border-right: 2px solid black" | 116 |- | style="border-right: 2px solid black" | ASCII, binary | 0 | 1 | 0 | 0 | 0 | style="border-right: 2px solid red" |0 | 1 | style="border-right: 2px solid green" | 1 | 0 | 1 | 1 | style="border-right: 2px solid red" | 0 | 0 | 0 | 0 | style="border-right: 2px solid green" | 1 | 0 | style="border-right: 2px solid red" | 1 | 1 | 1 | 0 | 1 | 0 | style="border-right: 2px solid black" | 0 |- | style="border-right: 2px solid black" | New decimal values ! colspan=6 style="border-right: 2px solid red" | 16 ! colspan=6 style="border-right: 2px solid red" | 54 ! colspan=6 style="border-right: 2px solid red" | 5 ! colspan=6 style="border-right: 2px solid black" | 52 |- | style="border-right: 2px solid black" | +32 ! colspan=6 style="border-right: 2px solid red" | 48 ! colspan=6 style="border-right: 2px solid red" | 86 ! colspan=6 style="border-right: 2px solid red" | 37 ! colspan=6 style="border-right: 2px solid black" | 84 |- | style="border-right: 2px solid black" | Uuencoded characters ! colspan=6 style="border-right: 2px solid red" | <code>0</code> ! colspan=6 style="border-right: 2px solid red" | <code>V</code> ! colspan=6 style="border-right: 2px solid red" | <code>%</code> ! colspan=6 style="border-right: 2px solid black" | <code>T</code> |} == uuencode table == The following table shows the conversion of the decimal value of the 6-bit fields obtained during the conversion process and their corresponding ASCII character output code and character. Note that some encoders might produce space (code 32) instead of grave accent ("`", code 96), while some decoders might refuse to decode data containing space. {|class="wikitable" style="text-align:center" !scope="col"| bits!!scope="col"| ASCII<br />code!!scope="col"| ASCII<br />char |rowspan="17"| !scope="col"| bits!!scope="col"| ASCII<br />code!!scope="col"| ASCII<br />char |rowspan="17"| !scope="col"| bits!!scope="col"| ASCII<br />code!!scope="col"| ASCII<br />char |rowspan="17"| !scope="col"| bits!!scope="col"| ASCII<br />code!!scope="col"| ASCII<br />char |- | 00 || 96 || <code>`</code> || 16 || 48 || <code>0</code> || 32 || 64 || <code>@</code> || 48 || 80 || <code>P</code> |- | 01 || 33 || <code>!</code> || 17 || 49 || <code>1</code> || 33 || 65 || <code>A</code> || 49 || 81 || <code>Q</code> |- | 02 || 34 || <code>"</code> || 18 || 50 || <code>2</code> || 34 || 66 || <code>B</code> || 50 || 82 || <code>R</code> |- | 03 || 35 || <code>#</code> || 19 || 51 || <code>3</code> || 35 || 67 || <code>C</code> || 51 || 83 || <code>S</code> |- | 04 || 36 || <code>$</code> || 20 || 52 || <code>4</code> || 36 || 68 || <code>D</code> || 52 || 84 || <code>T</code> |- | 05 || 37 || <code>%</code> || 21 || 53 || <code>5</code> || 37 || 69 || <code>E</code> || 53 || 85 || <code>U</code> |- | 06 || 38 || <code>&</code> || 22 || 54 || <code>6</code> || 38 || 70 || <code>F</code> || 54 || 86 || <code>V</code> |- | 07 || 39 || <code>'</code> || 23 || 55 || <code>7</code> || 39 || 71 || <code>G</code> || 55 || 87 || <code>W</code> |- | 08 || 40 || <code>(</code> || 24 || 56 || <code>8</code> || 40 || 72 || <code>H</code> || 56 || 88 || <code>X</code> |- | 09 || 41 || <code>)</code> || 25 || 57 || <code>9</code> || 41 || 73 || <code>I</code> || 57 || 89 || <code>Y</code> |- | 10 || 42 || <code>*</code> || 26 || 58 || <code>:</code> || 42 || 74 || <code>J</code> || 58 || 90 || <code>Z</code> |- | 11 || 43 || <code>+</code> || 27 || 59 || <code>;</code> || 43 || 75 || <code>K</code> || 59 || 91 || <code>[</code> |- | 12 || 44 || <code>,</code> || 28 || 60 || <code><</code> || 44 || 76 || <code>L</code> || 60 || 92 || <code>\</code> |- | 13 || 45 || <code>-</code> || 29 || 61 || <code>=</code> || 45 || 77 || <code>M</code> || 61 || 93 || <code>]</code> |- | 14 || 46 || <code>.</code> || 30 || 62 || <code>></code> || 46 || 78 || <code>N</code> || 62 || 94 || <code>^</code> |- | 15 || 47 || <code>/</code> || 31 || 63 || <code>?</code> || 47 || 79 || <code>O</code> || 63 || 95 || <code>_</code> |} == Example == The following is an example of uuencoding a one-line text file. In this example, '''%0D''' is the byte representation for [[carriage return]], and '''%0A''' is the byte representation for [[line feed]]. ;file File Name = wikipedia-url.txt File Contents = <nowiki>http://www.wikipedia.org%0D%0A</nowiki> ;uuencoding begin 644 wikipedia-url.txt ::'1T<#HO+W=W=RYW:6MI<&5D:6$N;W)G#0H` ` end == Forks (file, resource) == Unix traditionally has a single [[fork (filesystem)|fork]] where file data is stored. However, some file systems support multiple forks associated with a single file. For example, classic Mac OS [[Hierarchical File System (Apple)|Hierarchical File System]] (HFS) supported a data fork and a ''[[resource fork]]''. Mac OS [[HFS Plus|HFS+]] supports multiple forks, as does Microsoft Windows [[NTFS]] ''alternate data streams''. Most uucoding tools will only handle data from the primary data fork, which can result in a loss of information when encoding/decoding (for example, Windows NTFS file comments are kept in a different fork). Some tools (like the classic Mac OS application [[UUTool]]) solved the problem by concatenating the different forks into one file and differentiating them by file name. == Relation to xxencode, Base64, and Ascii85 == {{Main|xxencoding|Base64|Ascii85}} Despite its limited range of characters, uuencoded data is sometimes corrupted on passage through certain computers using non-ASCII character sets such as [[EBCDIC]]. One attempt to solve the problem was the xxencode format, which used only alphanumeric characters and the plus and minus symbols. More common today is the Base64 format, which is based on the same concept of [[alphanumeric]]-only as opposed to ASCII 32β95. All three formats use 6 bits (64 different characters) to represent their input data. Base64 can also be generated by the uuencode program and is similar in format, except for the actual character translation: The header is changed to begin-base64 <mode> <file> the trailer becomes {{=}}{{=}}{{=}}{{=}} and lines between are encoded with characters chosen from ABCDEFGHIJKLMNOP QRSTUVWXYZabcdef ghijklmnopqrstuv wxyz0123456789+/ Another alternative is [[Ascii85]], which encodes four binary characters in five ASCII characters. Ascii85 is used in [[PostScript]] and [[PDF]] formats. == Disadvantages == uuencoding takes 3 pre-formatted bytes and turns them into 4 and also adds begin/end tags, filename, and [[delimiters]]. This adds at least 33% data overhead compared to the source alone, though this can be at least somewhat compensated for by compressing the file before uuencoding it. == Support in languages == === Python === The [[Python (programming language)|Python]] language supports uuencoding using the codecs module with the codec "uu": For Python 2 ''(deprecated/sunset as of January 1st 2020)'': <syntaxhighlight lang="console" highlight="3"> $ python -c 'print "Cat".encode("uu")' begin 666 <data> #0V%T end $</syntaxhighlight> For Python 3 ''where the codecs module needs to be imported and used directly'': <syntaxhighlight lang="console" highlight="2"> $ python3 -c "from codecs import encode;print(encode(b'Cat', 'uu'))" b'begin 666 <data>\n#0V%T\n \nend\n' $</syntaxhighlight> To decode, pass the whole file: <syntaxhighlight lang="console" highlight="2"> $ python3 -c "from codecs import decode;print(decode(b'begin 666 <data>\n#0V%T\n \nend\n', 'uu'))" b'Cat' </syntaxhighlight> === Perl === The [[Perl]] language supports uuencoding natively using the pack() and unpack() operators with the format string "u": <syntaxhighlight lang="console"> $ perl -e 'print pack("u","Cat")' #0V%T </syntaxhighlight> Decoding base64 with unpack can likewise be accomplished by translating the characters: <syntaxhighlight lang="console"> $ perl -e 'print unpack("u","#0V%T")' Cat </syntaxhighlight> To produce wellformed uuencoded files, you need to use modules,<ref>{{Cite web |title=PerlPowerTools source |url=https://metacpan.org/dist/PerlPowerTools |access-date=2024-02-12 |website=metacpan.org |language=en-US}}</ref> or a little bit more of code:<ref>{{Cite web |title=uuencode.pl source |url=http://main.linuxfocus.org/~guido/scripts/uuencode_pl.html |access-date=2024-02-12 |website=main.linuxfocus.org |language=en-US}}</ref> ==== Encode (oneliner) ==== <syntaxhighlight lang="console"> $ perl -ple 'BEGIN{use File::Basename;$/=undef;$sn=basename($ARGV[0]);} $_= "begin 600 $sn\n".(pack "u", $_)."`\nend" if $_' /some/file/to_encode.gz </syntaxhighlight> ==== Encode/Decode (proper Perl scripts) ==== https://metacpan.org/dist/PerlPowerTools/view/bin/uuencode https://metacpan.org/dist/PerlPowerTools/view/bin/uudecode == See also == * [[Binary-to-text encoding]] for a comparison of various encoding algorithms == References == {{Reflist}} == External links == {{link farm|section|date=March 2020}} * [https://pubs.opengroup.org/onlinepubs/9699919799/utilities/uuencode.html ''uuencode'' entry] in POSIX.1-2008 * [https://www.gnu.org/software/sharutils GNU-sharutils] β open source suite of shar/unshar/uuencode/uudecode utilities * [http://www.fpx.de/fp/Software/UUDeview/ UUDeview] β open-source program to encode/decode Base64, BinHex, uuencode, xxencode, etc. for Unix/Windows/DOS * [http://www.bastet.com/ UUENCODE-UUDECODE] β open-source program to encode/decode created by Clem "Grandad" Dye * [http://www.stuartcheshire.org/StUU.html StUU] β open source fast UUDecoder for Macintosh by [[Stuart Cheshire]] * [http://www.webutils.pl/UUencode UUENCODE-UUDECODE] β free on-line UUEncoder and UUDecoder * [https://github.com/biagioT/java-uudecoder Java UUDecoder] β open source Java library for decoding uuencoded (mail) attachments * [https://www.nxp.com/docs/en/application-note/AN11229.pdf AN11229] β NXP application note: UUencoding for UART ISP {{lowercase}} {{Data exchange}} [[Category:Email]] [[Category:Usenet]] [[Category:Binary-to-text encoding formats]] [[Category:Unix SUS2008 utilities]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite web
(
edit
)
Template:Data exchange
(
edit
)
Template:Distinguish
(
edit
)
Template:Link farm
(
edit
)
Template:Lowercase
(
edit
)
Template:Main
(
edit
)
Template:Reflist
(
edit
)
Template:Section link
(
edit
)
Template:Short description
(
edit
)