Editing UTF-16 (section)

== Byte-order encoding schemes ==
UTF-16 and UCS-2 produce a sequence of 16-bit code units. Since most communication and storage protocols are defined for bytes, and each unit thus takes two 8-bit bytes, the order of the bytes may depend on the [[endianness]] (byte order) of the computer architecture.

To assist in recognizing the byte order of code units, '''UTF-16''' allows a [[byte order mark]] (BOM), a code point with the value U+FEFF, to precede the first actual coded value.{{efn|UTF-8 encoding produces byte values strictly less than 0xFE, so either byte in the BOM sequence also identifies the encoding as UTF-16 (assuming that UTF-32 is not expected).}} (U+FEFF is the invisible [[zero-width non-breaking space]]/ZWNBSP character).{{efn|Use of U+FEFF as the character ZWNBSP instead of as a BOM has been deprecated in favor of U+2060 (WORD JOINER); see [https://www.unicode.org/faq/utf_bom.html#BOM Byte Order Mark (BOM) FAQ] at Unicode.org.  But if an application interprets an initial BOM as a character, the ZWNBSP character is invisible, so the impact is minimal.}} If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the [[{{Proper name|noncharacter}}]] value U+FFFE reserved for this purpose. This incorrect result provides a hint to perform byte-swapping for the remaining values.

If the BOM is missing, RFC 2781 recommends{{efn|{{IETF RFC|2781}} section 4.3 says that if there is no BOM, "the text SHOULD be interpreted as being big-endian." According to section 1.2, the meaning of the term "SHOULD" is governed by {{IETF RFC|2119}}. In that document, section 3 says "... there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course".}} that big-endian (BE) encoding be assumed. In practice, due to Windows using little-endian (LE) order by default, many applications assume little-endian encoding. It is also reliable to detect endianness by looking for null bytes, on the assumption that characters less than U+0100 are very common. If more even bytes (starting at 0) are null, then it is big-endian.

The standard also allows the byte order to be stated explicitly by specifying '''UTF-16BE''' or '''UTF-16LE''' as the encoding type.  When the byte order is specified explicitly this way, a BOM is specifically ''not'' supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a ZWNBSP character. Most applications ignore a BOM in all cases despite this rule.

For [[Internet]] protocols, [[Internet Assigned Numbers Authority|IANA]] has approved "UTF-16", "UTF-16BE", and "UTF-16LE" as the names for these encodings (the names are case insensitive). The aliases '''UTF_16''' or '''UTF16''' may be meaningful in some programming languages or software applications, but they are not standard names in Internet protocols.

Similar designations, '''UCS-2BE''' and '''UCS-2LE''', are used to show versions of '''UCS-2'''.