Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-8
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== History == {{See also|Universal Coded Character Set#History}} The [[International Organization for Standardization]] (ISO) set out to compose a universal multi-byte character set in 1989. The draft ISO 10646 standard contained a non-required [[Addendum|annex]] called [[UTF-1]] that provided a byte stream encoding of its [[32-bit computing|32-bit]] code points. This encoding was not satisfactory on performance grounds, among other problems, and the biggest problem was probably that it did not have a clear separation between ASCII and non-ASCII: new UTF-1 tools would be backward compatible with ASCII-encoded text, but UTF-1-encoded text could confuse existing code expecting ASCII (or [[extended ASCII]]), because it could contain continuation bytes in the range {{mono|0x21}}–{{mono|0x7E}} that meant something else in ASCII, e.g., {{mono|0x2F}} for <code>/</code>, the [[Unix]] [[Path (computing)|path]] directory separator. In July 1992, the [[X/Open]] committee XoJIG was looking for a better encoding. Dave Prosser of [[Unix System Laboratories]] submitted a proposal for one that had faster implementation characteristics and introduced the improvement that 7-bit ASCII characters would ''only'' represent themselves; multi-byte sequences would only include bytes with the high bit set. The name ''File System Safe UCS Transformation Format'' (''FSS-UTF'')<ref>{{cite web|url=https://www.unicode.org/L2/Historical/wg20-n193-fss-utf.pdf|title=File System Safe UCS — Transformation Format (FSS-UTF) - X/Open Preliminary Specification|website=unicode.org}}</ref> and most of the text of this proposal were later preserved in the final specification.<ref name="FSS-UTF">{{cite journal |title=Appendix F. FSS-UTF / File System Safe UCS Transformation format |journal=The Unicode Standard 1.1 |url=https://www.unicode.org/versions/Unicode1.1.0/appF.pdf |access-date=2016-06-07 |url-status=live |archive-url=https://web.archive.org/web/20160607215950/https://www.unicode.org/versions/Unicode1.1.0/appF.pdf |archive-date=2016-06-07}}</ref><ref name="Whistler_2001">{{cite web |title=FSS-UTF, UTF-2, UTF-8, and UTF-16 |author-first=Kenneth |author-last=Whistler |date=2001-06-12 |url=https://unicode.org/mail-arch/unicode-ml/y2001-m06/0318.html |access-date=2006-06-07 |url-status=live |archive-url=https://web.archive.org/web/20160607220249/https://unicode.org/mail-arch/unicode-ml/y2001-m06/0318.html |archive-date=2016-06-07 }}</ref><ref name="pikeviacambridge">{{cite web |url=https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt |title=UTF-8 history |author-first=Rob |author-last=Pike |author-link=Rob Pike |date=2003-04-30 |access-date=2012-09-07}}</ref> In August 1992, this proposal was circulated by an [[IBM]] X/Open representative to interested parties. A modification by [[Ken Thompson]] of the [[Plan 9 from Bell Labs|Plan 9 operating system]] group at [[Bell Labs]] made it [[Self-synchronizing code|self-synchronizing]], letting a reader start anywhere and immediately detect character boundaries, at the cost of being somewhat less bit-efficient than the previous proposal. It also abandoned the use of biases that prevented [[#overlong encodings|overlong encodings]].<ref name=pikeviacambridge/><ref>At that time subtraction was slower than bit logic on many computers, and speed was considered necessary for acceptance.{{citation needed|date=October 2024}}</ref> Thompson's design was outlined on September 2, 1992, on a [[placemat]] in a New Jersey diner with [[Rob Pike]]. In the following days, Pike and Thompson implemented it and updated [[Plan 9 from Bell Labs|Plan 9]] to use it throughout,<ref>{{cite book |chapter-url=https://www.cl.cam.ac.uk/~mgk25/ucs/UTF-8-Plan9-paper.pdf |chapter=Hello World or Καλημέρα κόσμε or こんにちは 世界 |title=Proceedings of the Winter 1993 USENIX Conference |first1=Rob |last1=Pike |first2=Ken |last2=Thompson |year=1993}}</ref> and then communicated their success back to X/Open, which accepted it as the specification for FSS-UTF.<ref name=pikeviacambridge/> UTF-8 was first officially presented at the [[USENIX]] conference in [[San Diego]], from January 25 to 29, 1993.<ref>{{cite web|url=https://www.usenix.org/legacy/publications/library/proceedings/sd93/| title=USENIX Winter 1993 Conference Proceedings|website=usenix.org}}</ref> The [[Internet Engineering Task Force]] adopted UTF-8 in its Policy on Character Sets and Languages in RFC 2277 ([[Request for Comments#Best Current Practice|<abbr title="Best Current Practice">BCP</abbr>]] 18) for future internet standards work in January 1998, replacing [[Single Byte Character Set]]s such as [[ISO/IEC 8859-1|Latin-1]] in older RFCs.<ref name="rfc2277">{{cite IETF |rfc=2277 |bcp=18 |title=IETF Policy on Character Sets and Languages |date=January 1998 |last1=Alvestrand |first1=Harald T. |author-link=Harald Alvestrand |publisher=[[Internet Engineering Task Force|IETF]]}}</ref> In November 2003, UTF-8 was restricted by {{IETF RFC|3629}} to match the constraints of the [[UTF-16]] character encoding: explicitly prohibiting code points corresponding to the high and low surrogate characters removed <!-- 2*2^10/(2^16-2^11) --> more than 3% of the three-byte sequences, and ending at {{tt|U+10FFFF}} removed <!-- (2^21-(2^16+2^20))/(2^21-2^16) --> more than 48% of the four-byte sequences and all five- and six-byte sequences.<ref>{{cite web |author-last=Pike |author-first=Rob |author-link=Rob Pike |date=2012-09-06 |title=UTF-8 turned 20 years old yesterday |url=https://plus.google.com/u/0/101960720994009339267/posts/Rz1udTvtiMg |url-status=dead |archive-url=https://commandcenter.blogspot.com/2020/01/utf-8-turned-20-years-old-in-2012.html |archive-date=2020-01-26 |access-date=2012-09-07}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)