Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-16
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" ([[Universal Coded Character Set|UCS]]) that would replace earlier language-specific encodings with one coordinated system. The goal was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings, which required 1 byte per character, with an encoding using 65,536 (2<sup>16</sup>) values, which would require 2 bytes (16 bits) per character. Two groups worked on this in parallel, [[ISO/IEC JTC 1/SC 2]] and the [[Unicode Consortium]], the latter representing mostly manufacturers of computing equipment. The two groups attempted to synchronize their character assignments so that the developing encodings would be mutually compatible. The early 2-byte encoding was originally called "Unicode", but is now called "UCS-2".<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /><ref name="mysql_UCS-2">{{cite web|url=https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-ucs2.html|title=MySQL :: MySQL 5.7 Reference Manual :: 10.1.9.4 The ucs2 Character Set (UCS-2 Unicode Encoding)|website=dev.mysql.com}}</ref> When it became increasingly clear that 2<sup>16</sup> characters would not suffice,<ref name="unicode.org/faq">{{cite web|title=What is UTF-16?|url=https://www.unicode.org/faq/utf_bom.html#utf16-1|website=The Unicode Consortium|publisher=Unicode, Inc.|access-date=29 March 2018}}</ref> [[IEEE]] introduced a larger 31-bit space and an encoding ([[UCS-4]]) that would require 4 bytes per character. This was resisted by the [[Unicode Consortium]], both because 4 bytes per character wasted a lot of memory and disk space, and because some manufacturers were already heavily invested in 2-byte-per-character technology. The UTF-16 encoding scheme was developed as a compromise and introduced with version 2.0 of the Unicode standard in July 1996.<ref>{{cite web |url=https://www.unicode.org/faq//utf_bom.html|title=Questions about encoding forms |access-date=2010-11-12}}</ref> It is fully specified in RFC 2781, published in 2000 by the [[IETF]].<ref>ISO/IEC 10646:2014 "Information technology β Universal Coded Character Set (UCS)" sections 9 and 10.</ref><ref>{{cite book |title=The Unicode Standard version 7.0 |date=2014 |chapter-url=https://www.unicode.org/versions/Unicode7.0.0/ch02.pdf#G11153 |chapter=Chapter 2 General Structure |at=2.5 Encoding Forms}}</ref> UTF-16 is specified in the latest versions of both the international standard [[ISO/IEC 10646]] and the Unicode Standard. "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard."<ref name="unicode-6_0" /><ref name="ucs-2-utf-16-differences" /> UTF-16 will never be extended to support a larger number of code points or to support the code points that were replaced by surrogates, as this would violate the Unicode Stability Policy with respect to general category or surrogate code points.<ref>{{cite web|url=https://unicode.org/policies/stability_policy.html|title=Unicode Character Encoding Stability Policies|website=unicode.org}}</ref> (Any scheme that remains a [[self-synchronizing code]] would require allocating at least one [[Plane (Unicode)#Basic Multilingual Plane|Basic Multilingual Plane]] (BMP) code point to start a sequence. Changing the purpose of a code point is disallowed.)
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)