Editing UTF-7 (section)

==Description==
UTF-7 was first proposed as an experimental protocol in RFC 1642, ''A Mail-Safe Transformation Format of Unicode''.  This [[Request for Comments|RFC]] has been made obsolete by RFC 2152, an informational RFC which never became a standard. As RFC 2152 clearly states, the RFC "does not specify an Internet standard of any kind". Despite this, RFC 2152 is quoted as the definition of UTF-7 in the IANA's list of charsets. Neither is UTF-7 a Unicode Standard.  ''The Unicode Standard 5.0'' only lists UTF-8, UTF-16 and UTF-32.
There is also a modified version, specified in RFC 2060, which is sometimes identified as UTF-7.

Some characters can be represented directly as single ASCII bytes. The first group is known as "direct characters" and contains 62 alphanumeric characters and 9 symbols: <code>' ( ) , - . / : ?</code>. The direct characters are safe to include literally. The other main group, known as "optional direct characters", contains all other printable characters in the range {{U+|0020}}&ndash;U+007E except <code>~ \ +</code> and space (the characters {{code|\}} and {{code|~}} being excluded due to being redefined in "variants of ASCII" such as [[JIS-Roman]]). Using the optional direct characters reduces size and enhances human readability but also increases the chance of breakage by things like badly designed mail gateways and may require extra escaping when used in encoded words for header fields.

Space, tab, carriage return and line feed may also be represented directly as single ASCII bytes. However, if the encoded text is to be used in e-mail, care is needed to ensure that these characters are used in ways that do not require further content transfer encoding to be suitable for e-mail. The plus sign (<code>+</code>) ''may'' be encoded as <code>+-</code>.

Other characters must be encoded in [[UTF-16]] (hence U+10000 and higher would be encoded into two surrogates), and then in [[Base64#UTF-7|modified Base64]]. The start of these blocks of modified Base64-encoded UTF-16 is indicated by a <code>+</code> sign. The end is indicated by any character not in the modified Base64 set. If the character after the modified Base64 is a <code>-</code> (ASCII [[hyphen-minus]]) then it is consumed by the decoder and decoding resumes with the next character. Otherwise decoding resumes with the character after the Base64.