Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Codespace and code points === ''The Unicode Standard'' defines a ''codespace'':<ref name="Glossary">{{Cite web |title=Glossary of Unicode Terms |url=https://unicode.org/glossary/ |access-date=16 March 2010}}</ref> a sequence of integers called ''[[code point]]s''<ref name=":0">{{Cite book |url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-2/#G25564 |title=The Unicode Standard Version 16.0 β Core Specification |year=2024 |chapter=2.4 Code Points and Characters}}</ref> in the range from 0 to {{val|1114111}}, notated according to the standard as {{tt|U+0000}}β{{tt|U+10FFFF}}.<ref>{{Cite book |url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G2212 |title=The Unicode Standard, Version 16.0 |year=2024 |chapter=3.4 Characters and Encoding}}</ref> The codespace is a systematic, architecture-independent representation of ''The Unicode Standard''; actual text is processed as binary data via one of several Unicode encodings, such as [[UTF-8]]. In this normative notation, the two-character prefix <code>U+</code> always precedes a written code point,<ref>{{Cite mailing list |url=https://unicode.org/mail-arch/unicode-ml/y2005-m11/0060.html |title=Re: Origin of the U+nnnn notation |date=8 November 2005 |mailing-list=Unicode Mail List Archive}}</ref> and the code points themselves are written as [[hexadecimal]] numbers. At least four hexadecimal digits are always written, with [[leading zero]]s prepended as needed. For example, the code point {{unichar|F7|Division sign}} is padded with two leading zeros, but {{unichar|13254|Egyptian hieroglyph O004}} ([[File:Hiero O4.png|class=skin-invert-image|text-bottom|15px]]) is not padded.<ref>{{Cite web |date=September 2024 |title=Appendix A: Notational Conventions |url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/appendix-a/ |website=The Unicode Standard |publisher=Unicode Consortium}}</ref> There are a total of {{val|1112064}} valid code points within the codespace.<ref>{{cite book |title=The Unicode Standard |publisher=[[The Unicode Consortium]] |isbn=978-1-936213-01-6 |edition=6.0 |location=Mountain View, California, US |at=3.9 Unicode Encoding Forms |chapter=Conformance |quote=Each encoding form maps the Unicode code points U+0000..U+D7FF and U+E000..U+10FFFF |chapter-url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G7404}}</ref> This number arises from the limitations of the [[UTF-16]] character encoding, which can encode the 2<sup>16</sup> code points in the range {{tt|U+0000}} through {{tt|U+FFFF}} except for the 2<sup>11</sup> code points in the range {{tt|U+D800}} through {{tt|U+DFFF}}, which are used as surrogate pairs to encode the 2<sup>20</sup> code points in the range {{tt|U+10000}} through {{tt|U+10FFFF}}.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)