Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode block
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Named range of Unicode code points}} {{for|the specific group of square characters in the Unicode typeset|Block Elements}} A '''Unicode block''' is one of several contiguous ranges of numeric character codes ([[code point]]s) of the [[Unicode]] character set that are defined by the [[Unicode Consortium]] for administrative and documentation purposes. Typically, proposals such as the addition of new glyphs are discussed and evaluated by considering the relevant block or blocks as a whole. Each block is generally, but not always, meant to supply glyphs used by one or more specific languages, or in some general application area such as [[mathematics]], [[surveying]], decorative [[typesetting]], social forums, etc. == Design and implementation == Unicode blocks are identified by unique names, which use only ASCII characters and are usually descriptive of the nature of the symbols, in [[English language|English]]; such as "Tibetan" or "Supplemental Arrows-A". (When comparing block names, one is supposed to equate uppercase with lowercase letters, and ignore any whitespace, hyphens, and underbars; so the last name is equivalent to "supplemental_arrows_a" and "SUPPLEMENTALARROWSA".<ref name=uniblocks>{{Cite web|url=https://www.unicode.org/Public/UNIDATA/Blocks.txt|title=Unicode Blocks data file, Unicode version 15.1|publisher=Unicode Consortium|access-date=2023-09-12}}</ref> Blocks are [[intersection (set theory)|pairwise disjoint]]; that is, they do not overlap. The starting code point and the size (number of code points) of each block are always multiples of 16; therefore, in the [[hexadecimal]] notation, the starting (smallest) point is U+''xxx''0 and the ending (largest) point is U+''yyy''F, where ''xxx'' and ''yyy'' are three or more hexadecimal digits. (These constraints are intended to simplify the display of glyphs in Unicode Consortium documents, as tables with 16 rows labeled with the last hexadecimal digit of the code point.<ref name=uniblocks/>) The size of a block may range from the minimum of 16 to a maximum of 65,536 code points. Every assigned code point has a glyph property called "Block", whose value is a character string naming the unique block that owns that point.<ref>{{Cite web |title=Glossary |url=https://www.unicode.org/glossary/#B |access-date=2022-08-07 |website=www.unicode.org}}</ref> However, a block may also contain unassigned code points, usually reserved for future additions of characters that "logically" should belong to that block. Code points not belonging to any of the named blocks, e.g. in the unassigned [[Plane (Unicode)|planes]] 4β13, have the value block="No_Block".<ref name=uniblocks/> Simply belonging to a particular Unicode block does not guarantee the certain particular properties of the characters it is or will be expected to contain. The identity of any character is determined by its properties stated in the Unicode Character Database. For example, the contiguous range of 32 noncharacter code points U+FDD0..U+FDEF share none of the properties common to the other characters in the [[Arabic Presentation Forms-A]] block, that they are certainly not Arabic script characters or "right-to-left noncharacters", and are assigned there as a filler to this block given that it has been agreed that no further Arabic compatibility characters will be encoded. <ref>{{Cite web |title=Private-Use Characters, Noncharacters & Sentinels FAQ |url=https://www.unicode.org/faq/private_use.html |access-date=2023-07-24 | website=www.unicode.org}}</ref> == Other classifications == Each Unicode point also has a property called "[[General Category (Unicode)|General Category]]", that attempts to describe the role of the corresponding symbol in the languages or applications for whose sake it was included in the system. Examples of General Categories are "Lu" (meaning upper-case letter), "Nd" (decimal digit), "Pi" (open-quote punctuation), and "Mn" (non-spacing mark, i.e. a diacritic for the preceding glyph). This division is completely independent of code blocks: the code points with a given General Category generally span many blocks, and do not have to be consecutive, not even within each block.<ref name=uniprops>{{Cite web|url=http://www.unicode.org/versions/Unicode14.0.0/ch04.pdf#G124142|title=Unicode Core Specification, Chapter 4: Character Properties|access-date=2021-09-15}}</ref> Each code point also has a [[Scripts in Unicode|script property]], specifying which [[writing system]] it is intended for, or whether it is intended for multiple writing systems. This, also, is independent of block. In descriptions of the Unicode system, a block may be subdivided into more specific subgroups, such as the "[[Chess symbols in Unicode|Chess symbols]]" in the [[Miscellaneous Symbols]] block (not to be confused with the separate [[Chess Symbols]] block). Those subgroups are not "blocks" in the technical sense used by the Unicode consortium, and are named only for the convenience of users. == List of blocks == Unicode {{Unicode version|version=16.0}} defines 338 blocks:<ref name=uniblocks/> * 164 in plane 0, the Basic Multilingual Plane (in table below: {{slink||BMP}}) * 161 in plane 1, the Supplementary Multilingual Plane ({{slink||SMP}}) * 7 in plane 2, the Supplementary Ideographic Plane ({{slink||SIP}}) * 2 in plane 3, the Tertiary Ideographic Plane ({{slink||TIP}}) * 2 in plane 14 (E in [[hexadecimal]]), the Supplementary Special-purpose Plane ({{slink||SSP}}) * One each in the planes 15 (F<sub>hex</sub>) and 16 (10<sub>hex</sub>), called Supplementary Private Use Area-A and -B ({{slink||PUA-A}}) {{Unicode blocks|state=uncollapsed}} == {{anchor|Deleted blocks}}Moved blocks == The Unicode Stability Policy requires that a character, once assigned, may not be moved or removed, although it may be deprecated. This applies to Unicode 2.0 and all subsequent versions. Prior to this, the following former blocks were moved: {|class="wikitable collapsible" style="width:100%; margin:0;" |+Former Unicode blocks from before Unicode 2.0 |- !width="10%" |Block range !width="15%" |Historical<br/>block name !width="10%" |Version when added !width="10%" |Version when removed !width="10%" |Range now occupied by !width="15%" |Superseded by block !width="10%" |Code points !width="10%" |Assigned characters !width="10%" |[[Script (Unicode)|Scripts]] |- | U+1000..U+105F | [[Tibetan (obsolete Unicode block)|Tibetan]]<ref name="unicode1.0blocks">{{cite web |url=https://www.unicode.org/versions/Unicode1.0.0/CodeCharts2.pdf |work=The Unicode Standard |version=Version 1.0 |title=3.8: Block-by-Block Charts |publisher=[[Unicode Consortium]]}}</ref> | 1.0.0 | 1.0.1 | [[Myanmar (Unicode block)|Myanmar]] | [[Tibetan (Unicode block)|Tibetan]] | 96 | 71 | [[Tibetan script|Tibetan]] |- | U+3400..U+3D2D | [[Hangul (obsolete Unicode block)|Hangul]]<ref name="unicode1.1blocks">{{cite web |url=https://www.unicode.org/versions/Unicode1.1.0/appE.pdf |title=Appendix E: Block Names |work=The Unicode Standard |version=Version 1.1 |publisher=[[Unicode Consortium]]}}</ref> | 1.0.0 | 2.0 | rowspan=2|[[CJK Unified Ideographs Extension A]] | rowspan=3|[[Hangul Syllables]] | 2350 | 2350 | rowspan="3" | [[Hangul]] |- | U+3D2E..U+44B7 | [[Hangul (obsolete Unicode block)|Hangul Supplementary-A]]<ref name="unicode1.1blocks"/> | rowspan=2 | 1.1 | rowspan=2 | 2.0 | 1930 | 1930 |- | U+44B8..U+4DFF | [[Hangul (obsolete Unicode block)|Hangul Supplementary-B]]<ref name="unicode1.1blocks"/> | [[CJK Unified Ideographs Extension A]] and [[Yijing Hexagram Symbols]] | 2376 | 2376 |} == References == {{reflist}} == External links == * {{oweb|https://www.unicode.org/}} of the Unicode Consortium {{in lang|en}} {{Unicode character mapping tables}} {{Unicode navigation}} {{MathematicalSymbolsNotationLanguage}} [[Category:Unicode blocks| ]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Cite web
(
edit
)
Template:For
(
edit
)
Template:In lang
(
edit
)
Template:MathematicalSymbolsNotationLanguage
(
edit
)
Template:Oweb
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Slink
(
edit
)
Template:Unicode blocks
(
edit
)
Template:Unicode character mapping tables
(
edit
)
Template:Unicode navigation
(
edit
)
Template:Unicode version
(
edit
)