Editing UTF-8 (section)

{{Short description|ASCII-compatible variable-width encoding of Unicode}}
{{Infobox character encoding
| name = UTF-8
| mime = 
| alias = 
| image = 
| caption = 
| standard = [https://www.unicode.org/versions/latest/ Unicode Standard]
| status = 
| classification = [[Unicode Transformation Format]], [[extended ASCII]], [[variable-width encoding|variable-length encoding]]
| encodes = [[ISO/IEC 10646]] ([[Unicode]])
| extends = [[ASCII]]
| prev = [[UTF-1]]
| next = 
}}

'''UTF-8''' is a [[character encoding]] standard used for electronic communication. Defined by the [[Unicode]] Standard, the name is derived from ''Unicode Transformation Format{{snd}} 8-bit''.<ref>{{Cite book |title=The Unicode Standard |edition=6.0 |chapter=Chapter 2. General Structure |publisher=[[The Unicode Consortium]] |location=Mountain View, California, US |isbn=978-1-936213-01-6 |chapter-url=https://www.unicode.org/versions/Unicode6.0.0/}}</ref> Almost every webpage is stored in UTF-8.

UTF-8 supports all 1,112,064<ref>{{cite book |title=The Unicode Standard |publisher=[[The Unicode Consortium]] |isbn=978-1-936213-01-6 |edition=6.0 |location=Mountain View, California, US |at=3.9 Unicode Encoding Forms |chapter=Conformance |quote=Each encoding form maps the Unicode code points U+0000..U+D7FF and U+E000..U+10FFFF |chapter-url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G7404}}</ref> valid Unicode [[Code point#In_Unicode|code points]] using a [[variable-width encoding]] of one to four one-[[byte]] (8-bit) code units.

Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for [[backward compatibility]] with [[ASCII]]: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any [[extended ASCII]] can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding.<ref name="Microsoft GDK" /><ref name="whatwg" />

UTF-8 is dominant for all countries/languages on the internet<!-- on the web, but more generally: e-mail, JSON, and likely e.g. XML too -->, with 99% <!-- rounded up --> global average use, is used in most standards, often the only allowed encoding, and is supported by all modern operating systems and programming languages.