Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-8
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|ASCII-compatible variable-width encoding of Unicode}} {{Infobox character encoding | name = UTF-8 | mime = | alias = | image = | caption = | standard = [https://www.unicode.org/versions/latest/ Unicode Standard] | status = | classification = [[Unicode Transformation Format]], [[extended ASCII]], [[variable-width encoding|variable-length encoding]] | encodes = [[ISO/IEC 10646]] ([[Unicode]]) | extends = [[ASCII]] | prev = [[UTF-1]] | next = }} '''UTF-8''' is a [[character encoding]] standard used for electronic communication. Defined by the [[Unicode]] Standard, the name is derived from ''Unicode Transformation Format{{snd}} 8-bit''.<ref>{{Cite book |title=The Unicode Standard |edition=6.0 |chapter=Chapter 2. General Structure |publisher=[[The Unicode Consortium]] |location=Mountain View, California, US |isbn=978-1-936213-01-6 |chapter-url=https://www.unicode.org/versions/Unicode6.0.0/}}</ref> Almost every webpage is stored in UTF-8. UTF-8 supports all 1,112,064<ref>{{cite book |title=The Unicode Standard |publisher=[[The Unicode Consortium]] |isbn=978-1-936213-01-6 |edition=6.0 |location=Mountain View, California, US |at=3.9 Unicode Encoding Forms |chapter=Conformance |quote=Each encoding form maps the Unicode code points U+0000..U+D7FF and U+E000..U+10FFFF |chapter-url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G7404}}</ref> valid Unicode [[Code point#In_Unicode|code points]] using a [[variable-width encoding]] of one to four one-[[byte]] (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for [[backward compatibility]] with [[ASCII]]: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any [[extended ASCII]] can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding.<ref name="Microsoft GDK" /><ref name="whatwg" /> UTF-8 is dominant for all countries/languages on the internet<!-- on the web, but more generally: e-mail, JSON, and likely e.g. XML too -->, with 99% <!-- rounded up --> global average use, is used in most standards, often the only allowed encoding, and is supported by all modern operating systems and programming languages.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)