Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-8
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Standards == The official name for the encoding is {{code|UTF-8}}, the spelling used in all Unicode Consortium documents. The [[hyphen-minus]] is required and no spaces are allowed. Some other names used are: * Most standards are also case-insensitive and <code>utf-8</code> is often used.{{citation needed|date=March 2023}} * Web standards (which include [[Cascading Style Sheets|CSS]], [[HTML]], [[XML]], and [[HTTP headers]]) also allow {{code|utf8}} and many other aliases<!-- e.g. "unicode20utf8" for UTF-8, likely not useful to list any or all, just stating "many"-->.<ref>{{cite web|url=https://encoding.spec.whatwg.org/#names-and-labels|title=Encoding Standard § 4.2. Names and labels|publisher=[[WHATWG]]|access-date=2018-04-29}}</ref> * The official [[Internet Assigned Numbers Authority]] lists {{code|csUTF8}} as the only alias,<ref name="IANA_2013_CS">{{cite web |publisher=[[Internet Assigned Numbers Authority]] |url=https://www.iana.org/assignments/character-sets |title=Character Sets |date=2013-01-23 |access-date=2013-02-08}}</ref> which is rarely used. * In some locales {{code|UTF-8N}} means UTF-8 ''without'' a [[byte order mark|byte-order mark]] (BOM), and in this case {{code|UTF-8}} ''may'' imply there ''is'' a BOM.<ref>{{cite web |url=https://suika.fam.cx/~wakaba/wiki/sw/n/BOM |title=BOM | work = suikawiki |archive-url=https://web.archive.org/web/20090117052232/https://suika.fam.cx/~wakaba/wiki/sw/n/BOM |archive-date=2009-01-17 |language=ja}}</ref><ref>{{cite web |author-last=Davis |author-first=Mark |author-link=Mark Davis (Unicode) |title=Forms of Unicode |publisher=[[IBM]] |url=https://www-128.ibm.com/developerworks/library/utfencodingforms/index.html |access-date=2013-09-18 |archive-url=https://web.archive.org/web/20050506211548/https://www-128.ibm.com/developerworks/library/utfencodingforms/index.html |archive-date=2005-05-06}}</ref> * In [[Windows]], UTF-8 is [[Windows code page|codepage]] <code>65001</code><ref>{{Cite web |url=https://www.dostips.com/forum/viewtopic.php?t=5357 |title=UTF-8 codepage 65001 in Windows 7 - part I |author=Liviu |quote=Previously under XP (and, unverified, but probably Vista, too) for loops simply did not work while codepage 65001 was active |language=en-gb |date=2014-02-07 |access-date=2018-01-30}}</ref> with the symbolic name <code>CP_UTF8</code> in source code. * In [[MySQL]], UTF-8 is called <code>utf8mb4</code>,<ref>{{Cite web |title=MySQL :: MySQL 8.0 Reference Manual :: 10.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding) |url=https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb4.html |work=MySQL 8.0 Reference Manual |publisher=[[Oracle Corporation]] |access-date=2023-03-14}}</ref> while {{code|utf8}} and {{code|utf8mb3}} refer to the obsolete [[CESU-8]] variant.<ref name="mysql3-utf8mb3">{{Cite web |title=MySQL :: MySQL 8.0 Reference Manual :: 10.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding) |url=https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html |work=MySQL 8.0 Reference Manual |publisher=[[Oracle Corporation]] |access-date=2023-02-24}}</ref> * In [[Oracle Database]] (since version 9.0), <code>AL32UTF8</code><ref>{{Cite web |title=Database Globalization Support Guide |url=https://docs.oracle.com/cd/E11882_01/server.112/e10729/ch6unicode.htm |access-date=2023-03-16 |website=docs.oracle.com |language=en}}</ref> means UTF-8, while {{code|UTF-8}} means CESU-8. * In HP [[Printer Command Language|PCL]], the Symbol-ID for UTF-8 is <code>18N</code>.<ref>{{Cite web|url=https://pclhelp.com/pcl-symbol-sets/ |archive-url=https://web.archive.org/web/20150219212843/http://pclhelp.com/pcl-symbol-sets/|url-status=dead|archive-date=2015-02-19|title=HP PCL Symbol Sets {{!}} Printer Control Language (PCL & PXL) Support Blog|date=2015-02-19|access-date=2018-01-30}}</ref> There are several current definitions of UTF-8 in various standards documents: * {{IETF RFC|3629|link=no}} / STD 63 (2003), which establishes UTF-8 as a standard internet protocol element * {{IETF RFC|5198|link=no}} defines UTF-8 [[Unicode equivalence|NFC]] for Network Interchange (2008) * ISO/IEC 10646:2020/Amd 1:2023<!-- §9.1 (2023? or 2020)--><ref>[https://www.iso.org/standard/83362.html ISO/IEC 10646].</ref> * ''The Unicode Standard, Version 16.0.0'' (2024)<ref>''[https://www.unicode.org/versions/Unicode16.0.0/ The Unicode Standard, Version 16.0]'' [https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G31703 §3.9 D92, §3.10 D95], 2021.</ref> They supersede the definitions given in the following obsolete works: * ''The Unicode Standard, Version 2.0'', Appendix A (1996) * ISO/IEC 10646-1:1993 Amendment 2 / Annex R (1996) * {{IETF RFC|2044|link=no}} (1996) * {{IETF RFC|2279|link=no}} (1998) * ''The Unicode Standard, Version 3.0'', §2.3 (2000) plus Corrigendum #1 : UTF-8 Shortest Form (2000) * ''Unicode Standard Annex #27: Unicode 3.1'' (2001)<ref>[https://www.unicode.org/reports/tr27/tr27-3.html ''Unicode Standard Annex #27: Unicode 3.1''], 2001.</ref> * <!-- Is there a reason to single out 5.0 and 6.0, but not e.g. 15? Skip all after 3.0, since only then encoding of UTF-8 changed? -->''The Unicode Standard, Version 5.0'' (2006)<ref>[https://www.unicode.org/versions/Unicode5.0.0/ ''The Unicode Standard, Version 5.0''] [https://www.unicode.org/versions/Unicode5.0.0/ch03.pdf §3.9–§3.10 ch. 3], 2006.</ref> * ''The Unicode Standard, Version 6.0'' (2010)<ref>[https://www.unicode.org/versions/Unicode6.0.0/ ''The Unicode Standard, Version 6.0''] [https://www.unicode.org/versions/Unicode6.0.0/ch03.pdf §3.9 D92, §3.10 D95], 2010.</ref> They are all the same in their general mechanics, with the main differences being on issues such as allowed range of code point values and safe handling of invalid input.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)