Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{pp|small=yes}} {{Short description|Character encoding standard}} {{Use Oxford spelling|date=September 2022}} {{CS1 config|mode=cs1}} {{Use dmy dates|date=March 2023|cs1-dates=y}} {{Infobox character encoding | name = Unicode | alias = {{hlist|[[Universal Coded Character Set]] (UCS)|ISO/IEC 10646}} | image = New Unicode logo.svg | caption = Logo of the [[Unicode Consortium]] | standard = Unicode Standard | lang = 168 scripts ''([[Script (Unicode)#List of scripts in Unicode|list]])'' | encodings = {{hlist|class=inline|[[UTF-8]]|[[UTF-16]]|[[GB 18030|GB18030]]}}{{hr}}{{hlist|class=inline||[[UTF-32]]|[[Binary Ordered Compression for Unicode|BOCU]]|[[Standard Compression Scheme for Unicode|SCSU]]|[[UTF-EBCDIC]]}} (uncommon){{hr}}{{hlist|class=inline|[[UTF-7]]|[[UTF-1]]}} (obsolete) | prev = [[ISO/IEC 8859]], among others | extra = {{hlist | {{official website|1=https://home.unicode.org/|name=Official website}} | {{official website|1=https://www.unicode.org/main.html|name=Technical website}}}} }} {{Contains special characters|special=uncommon Unicode characters}} '''Unicode''', formally '''''The Unicode Standard''''',{{refn|group="note"|1=Sometimes abbreviated as '''TUS'''.<ref>{{Cite web|date=27 March 2002 |title=Unicode Technical Report #28: Unicode 3.2 |url=https://www.unicode.org/reports/tr28/tr28-3.html#errata |access-date=23 June 2022 |website=Unicode Consortium}}</ref><ref>{{Cite web |last=Jenkins |first=John H. |date=26 August 2021 |title=Unicode Standard Annex #45: U-source Ideographs |url=https://www.unicode.org/reports/tr45/tr45-25.html |access-date=23 June 2022 |website=Unicode Consortium |at=Β§2.2 The Source Field}}</ref>}} is a [[character encoding]] standard maintained by the [[Unicode Consortium]] designed to support the use of text in all of the world's [[writing system]]s that can be digitized. Version 16.0 of the standard{{efn-ua|name=standard-latest}} defines {{val|154998}} [[Character (computing)|characters]] and 168 [[script (Unicode)|scripts]]<ref>{{multiref |<!-- Graphic + Format count is used here -->{{Cite web|url=https://www.unicode.org/versions/stats/charcountv16_0.html|title=Unicode Character Count V16.0 |date=10 September 2024 |publisher=The Unicode Consortium}} | {{Cite web|title=Unicode 16.0 Versioned Charts Index|url=https://www.unicode.org/charts/PDF/Unicode-16.0/ |publisher=The Unicode Consortium |date=10 September 2024}} | {{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=11 September 2024 |date=10 September 2024 |publisher=The Unicode Consortium}} }}</ref> used in various ordinary, literary, academic, and technical contexts. Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode encodes 3790 [[emoji]], with the continued development thereof conducted by the Consortium as a part of the standard.<ref>{{Cite web |title=Emoji Counts, v16.0 |url=https://www.unicode.org/emoji/charts-16.0/emoji-counts.html |access-date=10 September 2024 |publisher=The Unicode Consortium}}</ref> Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters. Unicode has largely supplanted the previous environment of a myriad of incompatible [[character sets]], each used within different locales and on different computer architectures. Unicode is used to encode the vast majority of text on the Internet, including most [[web pages]], and relevant Unicode support has become a common consideration in contemporary software development. The Unicode [[character repertoire]] is synchronized with [[Universal Coded Character Set|ISO/IEC 10646]], each being code-for-code identical with one another. However, ''The Unicode Standard'' is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include [[Unicode equivalence#Normalization|character normalization]], [[Combining character|character composition]] and decomposition, [[Unicode collation algorithm|collation]], and [[Bidirectional text#Unicode bidi support|directionality]].<ref>{{Cite web |title=The Unicode Standard: A Technical Introduction |url=https://www.unicode.org/standard/principles.html |date=22 August 2019 |access-date=11 September 2024}}</ref> Unicode text is processed and stored as binary data [[comparison of Unicode encodings|using one of several encodings]], which define how to translate the standard's abstracted codes for characters into sequences of bytes. ''The Unicode Standard'' itself defines three encodings: [[UTF-8]], [[UTF-16]], and [[UTF-32]], though several others exist. Of these, UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with [[ASCII]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)