Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-7
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Description== UTF-7 was first proposed as an experimental protocol in RFC 1642, ''A Mail-Safe Transformation Format of Unicode''. This [[Request for Comments|RFC]] has been made obsolete by RFC 2152, an informational RFC which never became a standard. As RFC 2152 clearly states, the RFC "does not specify an Internet standard of any kind". Despite this, RFC 2152 is quoted as the definition of UTF-7 in the IANA's list of charsets. Neither is UTF-7 a Unicode Standard. ''The Unicode Standard 5.0'' only lists UTF-8, UTF-16 and UTF-32. There is also a modified version, specified in RFC 2060, which is sometimes identified as UTF-7. Some characters can be represented directly as single ASCII bytes. The first group is known as "direct characters" and contains 62 alphanumeric characters and 9 symbols: <code>' ( ) , - . / : ?</code>. The direct characters are safe to include literally. The other main group, known as "optional direct characters", contains all other printable characters in the range {{U+|0020}}–U+007E except <code>~ \ +</code> and space (the characters {{code|\}} and {{code|~}} being excluded due to being redefined in "variants of ASCII" such as [[JIS-Roman]]). Using the optional direct characters reduces size and enhances human readability but also increases the chance of breakage by things like badly designed mail gateways and may require extra escaping when used in encoded words for header fields. Space, tab, carriage return and line feed may also be represented directly as single ASCII bytes. However, if the encoded text is to be used in e-mail, care is needed to ensure that these characters are used in ways that do not require further content transfer encoding to be suitable for e-mail. The plus sign (<code>+</code>) ''may'' be encoded as <code>+-</code>. Other characters must be encoded in [[UTF-16]] (hence U+10000 and higher would be encoded into two surrogates), and then in [[Base64#UTF-7|modified Base64]]. The start of these blocks of modified Base64-encoded UTF-16 is indicated by a <code>+</code> sign. The end is indicated by any character not in the modified Base64 set. If the character after the modified Base64 is a <code>-</code> (ASCII [[hyphen-minus]]) then it is consumed by the decoder and decoding resumes with the next character. Otherwise decoding resumes with the character after the Base64.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)