Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Unicode
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Origin and development == Unicode was originally designed with the intent of transcending limitations present in all text encodings designed up to that point: each encoding was relied upon for use in its own context, but with no particular expectation of compatibility with any other. Indeed, any two encodings chosen were often totally unworkable when used together, with text encoded in one [[mojibake|interpreted as garbage characters]] by the other. Most encodings had only been designed to facilitate interoperation between a handful of scripts—often primarily between a given script and [[Latin character]]s—not between a large number of scripts, and not with all of the scripts supported being treated in a consistent manner. The philosophy that underpins Unicode seeks to encode the underlying characters—[[grapheme]]s and grapheme-like units—rather than graphical distinctions considered mere variant [[glyph]]s thereof, that are instead best handled by the [[typeface]], through the use of [[markup (computing)|markup]], or by some other means. In particularly complex cases, such as [[Han unification|the treatment of orthographical variants in Han characters]], there is considerable disagreement regarding which differences justify their own encodings, and which are only graphical variants of other characters. At the most abstract level, Unicode assigns a unique number called a {{em|[[code point]]}} to each character. Many issues of visual representation—including size, shape, and style—are intended to be up to the discretion of the software actually rendering the text, such as a [[web browser]] or [[word processor]]. However, partially with the intent of encouraging rapid adoption, the simplicity of this original model has become somewhat more elaborate over time, and various pragmatic concessions have been made over the course of the standard's development. The first 256 code points mirror the [[ISO/IEC 8859-1]] standard, with the intent of trivializing the conversion of text already written in Western European scripts. To preserve the distinctions made by different legacy encodings, therefore allowing for conversion between them and Unicode without any loss of information, many [[duplicate characters in Unicode|characters nearly identical to others]], in both appearance and intended function, were given distinct code points. For example, the [[Halfwidth and Fullwidth Forms (Unicode block)|Halfwidth and Fullwidth Forms]] block encompasses a full semantic duplicate of the Latin alphabet, because legacy [[CJK characters|CJK encodings]] contained both "fullwidth" (matching the width of CJK characters) and "halfwidth" (matching ordinary Latin script) characters. The Unicode Bulldog Award is given to people deemed to be influential in Unicode's development, with recipients including [[Tatsuo Kobayashi]], Thomas Milo, Roozbeh Pournader, [[Ken Lunde]], and [[Michael Everson]].<ref>{{Cite web|url=https://www.unicode.org/acknowledgements/bulldog.html|title=Unicode Bulldog Award|website=Unicode |url-status=live |archive-url=https://web.archive.org/web/20231111130143/http://www.unicode.org/acknowledgements/bulldog.html |archive-date= Nov 11, 2023 }}</ref> === {{anchor|Unicode 88}}History === The origins of Unicode can be traced back to the 1980s, to a group of individuals with connections to [[Xerox]]'s [[Xerox Character Code Standard|Character Code Standard]] (XCCS).<ref name="unicode-88" /> In 1987, Xerox employee [[Joe Becker (Unicode)|Joe Becker]], along with [[Apple Inc.|Apple]] employees [[Lee Collins (Unicode)|Lee Collins]] and [[Mark Davis (Unicode)|Mark Davis]], started investigating the practicalities of creating a universal character set.<ref>{{Cite web |title=Summary Narrative |url=https://www.unicode.org/history/summary.html |website=Unicode |date=August 31, 2006 |access-date=15 March 2010}}</ref> With additional input from Peter Fenwick and [[Dave Opstad]],<ref name="unicode-88" /> Becker published a draft proposal for an "international/multilingual text character encoding system in August 1988, tentatively called Unicode". He explained that "the name 'Unicode' is intended to suggest a unique, unified, universal encoding".<ref name="unicode-88">{{Cite web |last=Becker |first=Joseph D. |author-link=Joseph D. Becker |date=10 September 1998 |title=Unicode 88 |url=https://unicode.org/history/unicode88.pdf |url-status=live |archive-url=https://web.archive.org/web/20161125224409/https://unicode.org/history/unicode88.pdf |archive-date=25 November 2016 |access-date=25 October 2016 |publisher=[[Unicode Consortium]] |quote=In 1978, the initial proposal for a set of "Universal Signs" was made by [[Bob Belleville]] at [[Xerox PARC]]. Many persons contributed ideas to the development of a new encoding design. Beginning in 1980, these efforts evolved into the [[Xerox Character Code Standard]] (XCCS) by the present author, a multilingual encoding that has been maintained by Xerox as an internal corporate standard since 1982, through the efforts of Ed Smura, Ron Pellar, and others.<br />Unicode arose as the result of eight years of working experience with XCCS. Its fundamental differences from XCCS were proposed by Peter Fenwick and Dave Opstad (pure 16-bit codes) and by [[Lee Collins (Unicode)|Lee Collins]] (ideographic character unification). Unicode retains the many features of XCCS whose utility has been proved over the years in an international line of communication multilingual system products. |orig-year=1988-08-29}}</ref> In this document, entitled ''Unicode 88'', Becker outlined a scheme using [[16-bit computing|16-bit]] characters:<ref name="unicode-88" /> <blockquote> Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body [[ASCII]]" that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose. </blockquote> This design decision was made based on the assumption that only scripts and characters in "modern" use would require encoding:<ref name="unicode-88" /> <blockquote> Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities. Unicode aims in the first instance at the characters published in the modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988), whose number is undoubtedly far below 2<sup>14</sup> = 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare; these are better candidates for private use registration than for congesting the public list of generally useful Unicode. </blockquote> In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of Metaphor, Karen Smith-Yoshimura and Joan Aliprand of [[Research Libraries Group]], and Glenn Wright of [[Sun Microsystems]]. In 1990, Michel Suignard and Asmus Freytag of [[Microsoft]] and [[NeXT]]'s Rick McGowan had also joined the group. By the end of 1990, most of the work of remapping existing standards had been completed, and a final review draft of Unicode was ready. The [[Unicode Consortium]] was incorporated in California on 3 January 1991,<ref>{{Cite web |title=History of Unicode Release and Publication Dates |url=https://unicode.org/history/publicationdates.html |access-date=20 March 2023 |website=Unicode}}</ref> and the first volume of ''The Unicode Standard'' was published that October. The second volume, now adding Han ideographs, was published in June 1992. In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. This increased the Unicode codespace to over a million code points, which allowed for the encoding of many historic scripts, such as [[Egyptian hieroglyphs]], and thousands of rarely used or obsolete characters that had not been anticipated for inclusion in the standard. Among these characters are various rarely used [[CJK characters]]—many mainly being used in proper names, making them far more necessary for a universal encoding than the original Unicode architecture envisioned.<ref name="unicoderevisited">{{Cite web |last=Searle |first=Stephen J |title=Unicode Revisited |url=http://tronweb.super-nova.co.jp/unicoderevisited.html |access-date=18 January 2013}}</ref> Version 1.0 of Microsoft's TrueType specification, published in 1992, used the name "Apple Unicode" instead of "Unicode" for the Platform ID in the naming table. === Unicode Consortium === {{Main|Unicode Consortium}} The Unicode Consortium is a non-profit organization that coordinates Unicode's development. Full members include most of the main computer software and hardware companies (and few others) with any interest in text-processing standards, including [[Adobe Inc.|Adobe]], [[Apple Inc.|Apple]], [[Google]], [[IBM]], [[Meta Platforms|Meta]] (previously as Facebook), [[Microsoft]], [[Netflix]], and [[SAP]].<ref name="members">{{Cite web |title=The Unicode Consortium Members |url=https://unicode.org/consortium/members.html |access-date=12 February 2024}}</ref> Over the years several countries or government agencies have been members of the Unicode Consortium.<ref name="members" /> The Consortium has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with [[multilingualism|multilingual]] environments. === Scripts covered === {{Main|Script (Unicode)}} [[File:Unicode sample.png|class=skin-invert-image|thumb|right|200px|Many modern applications can render a substantial subset of the many [[scripts in Unicode]], as demonstrated by this screenshot from the [[OpenOffice.org]] application.]]<!-- screenshot fair use rationale: this screenshot is used specifically to illustrate the Unicode-related capabilities of modern desktop applications and the breadth of supported Unicode scripts --> Unicode currently covers most major [[writing system]]s in use today.<ref>{{Cite book |last=Otung |first=Ifiok |url=https://books.google.com/books?id=4OMXEAAAQBAJ&q=unicode+covers+almost+all+characters |title=Communication Engineering Principles |date=2021-01-28 |publisher=John Wiley & Sons |isbn=978-1-119-27407-0 |language=en|page=12}}</ref><ref>{{Cite web |title=Unicode FAQ |url=https://home.unicode.org/basic-info/faq/ |access-date=2 April 2020}}</ref> {{As of|2024}}, a total of 168 [[Script (Unicode)|scripts]]<ref>{{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=16 September 2022 |website=Unicode}}</ref> are included in the latest version of Unicode (covering [[alphabet]]s, [[abugida]]s and [[syllabary|syllabaries]]), although there are still scripts that are not yet encoded, particularly those mainly used in historical, liturgical, and academic contexts. Further additions of characters to the already encoded scripts, as well as symbols, in particular for mathematics and [[musical notation|music]] (in the form of notes and rhythmic symbols), also occur. The Unicode Roadmap Committee ([[Michael Everson]], Rick McGowan, Ken Whistler, V.S. Umamaheswaran)<ref>{{Cite web |title=Roadmap to the BMP |url=https://www.unicode.org/roadmaps/bmp/ |access-date=30 July 2018 |publisher=[[Unicode Consortium]]}}</ref> maintain the list of scripts that are candidates or potential candidates for encoding and their tentative code block assignments on the Unicode Roadmap<ref>{{Cite web|url=https://www.unicode.org/roadmaps/|title=Roadmaps to Unicode|website=Unicode |url-status=live |archive-url= https://web.archive.org/web/20231208091250/http://www.unicode.org/roadmaps/ |archive-date= Dec 8, 2023 }}</ref> page of the [[Unicode Consortium]] website. For some scripts on the Roadmap, such as [[Jurchen script|Jurchen]] and [[Khitan large script]], encoding proposals have been made and they are working their way through the approval process. For other scripts, such as [[Numidian language|Numidian]] and [[Rongorongo]], no proposal has yet been made, and they await agreement on character repertoire and other details from the user communities involved. Some modern invented scripts which have not yet been included in Unicode (e.g., [[Tengwar]]) or which do not qualify for inclusion in Unicode due to lack of real-world use (e.g., [[Klingon scripts|Klingon]]) are listed in the [[ConScript Unicode Registry]], along with unofficial but widely used [[Unicode private use area|private use area]] code assignments. There is also a [[Medieval Unicode Font Initiative]] focused on special Latin medieval characters. Part of these proposals has been already included in Unicode. === {{anchor|Script Encoding Initiative}} Script Encoding Initiative === The Script Encoding Initiative,<ref>{{Cite web|url=https://linguistics.berkeley.edu/sei/|title=script encoding initiative|website=Berkeley Linguistics |url-status=live |archive-url=https://web.archive.org/web/20230325131114/https://linguistics.berkeley.edu/sei/ |archive-date= Mar 25, 2023 }}</ref> a project run by Deborah Anderson at the [[University of California, Berkeley]] was founded in 2002 with the goal of funding proposals for scripts not yet encoded in the standard. The project has become a major source of proposed additions to the standard in recent years.<ref>{{Cite web |title=About The Script Encoding Initiative |url=https://www.unicode.org/pending/about-sei.html |access-date=4 June 2012 |publisher=The Unicode Consortium}}</ref> === Versions === The Unicode Consortium together with the ISO have developed a shared [[character encoding|repertoire]] following the initial publication of ''The Unicode Standard'': Unicode and the ISO's [[Universal Coded Character Set]] (UCS) use identical character names and code points. However, the Unicode versions do differ from their ISO equivalents in two significant ways. While the UCS is a simple character map, Unicode specifies the rules, algorithms, and properties necessary to achieve interoperability between different platforms and languages. Thus, ''The Unicode Standard'' includes more information, covering in-depth topics such as bitwise encoding, [[Unicode collation algorithm|collation]], and rendering. It also provides a comprehensive catalog of character properties, including those needed for supporting [[bidirectional text]], as well as visual charts and reference data sets to aid implementers. Previously, ''The Unicode Standard'' was sold as a print volume containing the complete core specification, standard annexes,{{refn|group="note"|1="A Unicode Standard Annex (UAX) forms an integral part of ''The Unicode Standard'', but is published as a separate document."[https://www.unicode.org/reports/tr31/tr31-5.html]}} and code charts. However, version 5.0, published in 2006, was the last version printed this way. Starting with version 5.2, only the core specification, published as a print-on-demand paperback, may be purchased.<ref name="version6.1PoD">{{Cite web |title=Unicode 6.1 Paperback Available |url=https://www.unicode.org/mail-arch/unicode-ml/y2012-m05/0240.html |access-date=30 May 2012 |website=announcements_at_unicode.org}}</ref> The full text, on the other hand, is published as a free PDF on the Unicode website. A practical reason for this publication method highlights the second significant difference between the UCS and Unicode—the frequency with which updated versions are released and new characters added. ''The Unicode Standard'' has regularly released annual expanded versions, occasionally with more than one version released in a calendar year and with rare cases where the scheduled release had to be postponed. For instance, in April 2020, a month after version 13.0 was published, the Unicode Consortium announced they had changed the intended release date for version 14.0, pushing it back six months to September 2021 due to the [[COVID-19 pandemic]]. Unicode 16.0, the latest version, was released on 10 September 2024. It added 5,185 characters and seven new scripts: [[Garay alphabet|Garay]], [[Khema script|Gurung Khema]], [[Kirat Rai]], [[Ol Onal]], [[Sunuwar alphabet|Sunuwar]], [[Todhri alphabet|Todhri]], and [[Tigalari script|Tulu-Tigalari]].<ref>{{Cite web |title=Unicode 16.0.0 |url=https://www.unicode.org/versions/Unicode16.0.0/|access-date=13 September 2024 |website=Unicode}}</ref> Thus far, the following versions of ''The Unicode Standard'' have been published. Update versions, which do not include any changes to character repertoire, are signified by the third number (e.g., "version 4.0.1") and are omitted in the table below.<ref>{{Cite web |title=Enumerated Versions of The Unicode Standard |url=https://www.unicode.org/versions/enumeratedversions.html |access-date=21 June 2016}}</ref> {{sticky header}} {| class="wikitable sortable sticky-header-multi" style="font-size:95%; width:100%; text-align:center" |+ Unicode version history and notable changes to characters and scripts |- ! scope="col" rowspan="2" | Version ! scope="col" rowspan="2" | Date ! scope="col" rowspan="2" class="unsortable" | Publication<br />(book, text) ! scope="col" rowspan="2" | [[Universal Coded Character Set|UCS]] edition ! colspan="2" | Total ! scope="col" rowspan="2" style="width:44%" class="unsortable" | Details |- ! scope="col" | Scripts ! scope="col" | Characters{{efn|The total number of graphic and format characters, excluding [[Unicode private use area|private use characters]], [[Unicode control characters|control characters]], [[noncharacter]]s, and [[surrogate code points]]).|group=tablenote}} |- id="1.0.0" | {{Unicode version|version=1.0.0}}<ref name="v1.0.0">{{multiref|1={{Cite Unicode|1.0.0}}|2={{cite web|url=https://www.unicode.org/Public/reconstructed/1.0.0/UnicodeData.txt|title=1.0.0/UnicodeData.txt (reconstructed) |date=2004|access-date=2010-03-16}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=1.0.0|format=month}}}} | {{ISBN|0-201-56788-1}} (vol. 1) | rowspan="2" ! {{n/a}} | 24 | {{val|7,129}} | style="text-align:left" | Initial scripts covered: [[Arabic script|Arabic]], [[Armenian alphabet|Armenian]], [[Bengali alphabet|Bengali]], [[Bopomofo]], [[Cyrillic script|Cyrillic]], [[Devanagari]], [[Georgian alphabet|Georgian]], [[Greek alphabet|Greek and Coptic]], [[Gujarati script|Gujarati]], [[Gurmukhi script|Gurmukhi]], [[Hangul]], [[Hebrew alphabet|Hebrew]], [[Hiragana]], [[Kannada script|Kannada]], [[Katakana]], [[Lao script|Lao]], [[Latin script|Latin]], [[Malayalam script|Malayalam]], [[Odia script|Odia]], [[Tamil script|Tamil]], [[Telugu script|Telugu]], [[Thai script|Thai]], and [[Tibetan script|Tibetan]]<!-- --> |- id="1.0.1" | {{Unicode version|version=1.0.1}}<ref name="v1.0.1">{{multiref|{{Cite Unicode|1.0.1}}|{{cite web|title=Unicode Data 1.0.1|url=https://www.unicode.org/Public/reconstructed/1.0.1/UnicodeData.txt|access-date=2010-03-16}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=1.0.1|format=month}}}} | {{ISBN|link=no|0-201-60845-6}} (vol. 2) | 25 | {{val|28,327}}{{su|p={{val|+21,204}}|b={{val|−6}}}} | style="text-align:left" | The initial 20,902 [[CJK Unified Ideographs]]<!-- --> |- id="1.1" | {{Unicode version|version=1.1}}<ref name="v1.1">{{multiref|1={{Cite Unicode|1.1.5}}|2={{cite web|title=Unicode Data 1995|url=https://www.unicode.org/Public/1.1-Update/UnicodeData-1.1.5.txt|access-date=2010-03-16}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=1.1|format=month}}}} | {{n/a}} | rowspan="3" | [[Universal Coded Character Set|ISO/IEC 10646]]-1:1993 {{efn|{{cslist|semi=yes|2.0 added Amendments 5, 6, and 7|2.1 added two characters from Amendment 18.}}|group=tablenote}} | 24 | {{val|34,168}}{{su|p={{val|+5,963}}|b={{val|−9}}}} | style="text-align:left" | 33 reclassified as control characters. 4,306 [[Hangul]] syllables, [[Tibetan script|Tibetan]] removed<!-- --> |- id="2.0" | {{Unicode version|version=2.0}}<ref name="v2.0.0">{{multiref|1={{Cite Unicode|2.0}}|2={{cite web|title=Unicode Data-2.0.14|url=https://www.unicode.org/Public/2.0-Update/UnicodeData-2.0.14.txt|access-date=2010-03-16}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=2.0|format=month}}}} | {{ISBN|link=no|0-201-48345-9}} | rowspan="2" | 25 | {{val|38,885}}{{su|p={{val|+11,373}}|b={{val|−6,656}}}} | style="text-align:left" | Original set of Hangul syllables removed, new set of 11,172 Hangul syllables added at new location, Tibetan added back in a new location and with a different character repertoire, Surrogate character mechanism defined, Plane 15 and Plane 16 [[Unicode private use area|private use area]] allocated<!-- --> |- id="2.1" | {{Unicode version|version=2.1}}<ref name="2.1.0">{{multiref|1={{Cite Unicode|2.1.2}}|2={{cite web|title=Unicode Data-2.1.2|url=https://www.unicode.org/Public/2.1-Update/UnicodeData-2.1.2.txt|access-date=2010-03-16}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=2.1|format=month}}}} | {{n/a}} | {{val|38,887}}{{su|p={{val|+2}}}} | style="text-align:left" | [[Euro sign|{{unichar|20AC|EURO SIGN}}]], [[Specials (Unicode block)|{{unichar|FFFC|OBJECT REPLACEMENT CHARACTER}}]]<ref name="2.1.0" /><!-- --> |- id="3.0" | {{Unicode version|version=3.0}}<ref name="3.0.0">{{multiref|1={{Cite Unicode|3.0}}|2={{cite web|title=Unicode Data-3.0.0|url=https://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt|access-date=2023-10-02}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=3.0|format=month}}}} | {{ISBN|link=no|0-201-61633-5}} | ISO/IEC 10646-1:2000 | 38 | {{val|49,194}}{{su|p={{val|+10,307}}}} | style="text-align:left" | [[Cherokee syllabary|Cherokee]], [[Geʽez script|Geʽez]], [[Khmer script|Khmer]], [[Mongolian script|Mongolian]], [[Burmese alphabet|Burmese]], [[Ogham]], [[runes]], [[Sinhala script|Sinhala]], [[Syriac alphabet|Syriac]], [[Thaana]], [[Canadian Aboriginal syllabics]], and [[Yi script|Yi Syllables]], [[Braille]] patterns<!-- --> |- id="3.1" | {{Unicode version|version=3.1}}<ref name="3.1.0">{{multiref|1={{Cite Unicode|3.1.0}}|2={{cite web|title=Unicode Data-3.1.0|url=https://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt|access-date=2023-10-02}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=3.1|format=month}}}} | rowspan="2" ! {{n/a}} | rowspan="2" | ISO/IEC 10646-1:2000{{efn|3.2 added Amendment 1.|group=tablenote}}{{hr}}ISO/IEC 10646-2:2001 | 41 | {{val|94,140}}{{su|p={{val|+44,946}}}} | style="text-align:left" | [[Deseret alphabet|Deseret]], [[Gothic alphabet|Gothic]] and [[Old Italic alphabet|Old Italic]], sets of symbols for Western and [[Byzantine music]], 42,711 additional CJK Unified Ideographs<!-- --> |- id="3.2" | {{Unicode version|version=3.2}}<ref name="3.2.0">{{multiref|1={{Cite Unicode|3.2.0}}|2={{cite web|title=Unicode Data-3.2.0|url=https://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt|access-date=2023-10-02}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=3.2|format=month}}}} | 45 | {{val|95,156}}{{su|p={{val|+1,016}}}} | style="text-align:left" | [[Philippines|Philippine]] scripts ([[Buhid script|Buhid]], [[Hanunoo script|Hanunoo]], [[Baybayin|Tagalog]], and [[Tagbanwa script|Tagbanwa]])<!-- --> |- id="4.0" | {{Unicode version|version=4.0}}<ref name="4.0.0">{{multiref|1={{Cite Unicode|4.0.0}}|2={{cite web|title=Unicode Data-4.0.0|url=https://www.unicode.org/Public/4.0-Update/UnicodeData-4.0.0.txt|access-date=2023-10-02}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=4.0|format=month}}}} | {{ISBN|link=no|0-321-18578-1}} | rowspan="5" | ISO/IEC 10646:2003 {{efn|{{cslist|semi=yes|4.1 added Amendment 1|5.0 added Amendment 2 as well as four characters from Amendment 3|5.1 added Amendment 4|5.2 added Amendments 5 and 6}}|group=tablenote}} | 52 | {{val|96,382}}{{su|p={{val|+1,226}}}} | style="text-align:left" | [[Cypriot syllabary]], [[Limbu script|Limbu]], [[Linear B]], [[Osmanya script|Osmanya]], [[Shavian alphabet|Shavian]], [[Tai Nüa language#Writing system|Tai Le]], and [[Ugaritic alphabet|Ugaritic]], [[Hexagram (I Ching)|Hexagram symbols]]<!-- --> |- id="4.1" | {{Unicode version|version=4.1}}<ref>{{multiref|1={{Cite Unicode|4.1.0}}|2={{cite web|url=https://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt|title=Unicode Data-4.1.0|access-date=2010-03-16}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=4.1|format=month}}}} | {{n/a}} | 59 | {{val|97,655}}{{su|p={{val|+1,273}}}} | style="text-align:left" | [[Lontara script|Buginese]], [[Glagolitic script|Glagolitic]], [[Kharosthi]], [[New Tai Lue alphabet|New Tai Lue]], [[Old Persian cuneiform|Old Persian]], [[Sylheti Nagri]], and [[Tifinagh]], [[Coptic alphabet|Coptic]] disunified from Greek, ancient [[Unicode numerals#Ancient Greek numerals|Greek numbers]] and [[Musical notation#Ancient Greece|musical symbols]], first named character sequences were introduced.<ref>{{cite web|url=https://www.unicode.org/Public/4.1.0/ucd/NamedSequences.txt |date=2005 |website=Unicode |title=Named Sequences-4.1.0|access-date=2010-03-16}}</ref><!-- --> |- id="5.0" | {{Unicode version|version=5.0}}<ref>{{Cite Unicode|5}}</ref> | {{dts|{{Unicode version/version-to-date|version=5.0|format=month}}}} | {{ISBN|link=no|0-321-48091-0}} | 64 | {{val|99,024}}{{su|p={{val|+1,369}}}} | style="text-align:left" | [[Balinese script|Balinese]], [[cuneiform]], [[N'Ko script|N'Ko]], [[ʼPhags-pa script|ʼPhags-pa]], [[Phoenician alphabet|Phoenician]]<ref>{{cite web |title=Unicode Data 5.0.0 |url=https://www.unicode.org/Public/5.0.0/ucd/UnicodeData.txt |access-date=2010-03-17}}</ref><!-- --> |- id="5.1" | {{Unicode version|version=5.1}}<ref>{{multiref|{{Cite Unicode|5.1}}|{{cite web|title=Unicode Data 5.1.0|url=https://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt|access-date=2010-03-17}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=5.1|format=month}}}} | {{n/a}} | 75 | {{val|100,648}}{{su|p={{val|+1,624}}}} | style="text-align:left" | [[Carian alphabets|Carian]], [[Cham script|Cham]], [[Kayah Li alphabet|Kayah Li]], [[Lepcha script|Lepcha]], [[Lycian script|Lycian]], [[Lydian script|Lydian]], [[Ol Chiki script|Ol Chiki]], [[Rejang alphabet|Rejang]], [[Saurashtra script|Saurashtra]], [[Sundanese script|Sundanese]], and [[Vai syllabary|Vai]], sets of symbols for the [[Phaistos Disc]], [[Mahjong]] tiles, [[Dominoes|Domino tiles]], additions to Burmese, [[Scribal abbreviation]]s, [[capital ẞ|{{unichar|1E9E|LATIN CAPITAL LETTER SHARP S}}]]<!-- --> |- id="5.2" | {{Unicode version|version=5.2}}<ref>{{multiref|{{Cite Unicode|5.2}}|{{cite web|title=Unicode Data 5.2.0|url=https://www.unicode.org/Public/5.2.0/ucd/UnicodeData.txt|access-date=2010-03-17}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=5.2|format=month}}}} | {{ISBN|link=no|978-1-936213-00-9}} | 90 | {{val|107,296}}{{su|p={{val|+6,648}}}} | style="text-align:left" | [[Avestan alphabet|Avestan]], [[Bamum script|Bamum]], [[Gardiner's sign list]] of [[Egyptian hieroglyphs]], [[Imperial Aramaic]], [[Inscriptional Pahlavi]], [[Inscriptional Parthian]], [[Javanese script|Javanese]], [[Kaithi]], [[Fraser script|Lisu]], [[Meitei script|Meetei Mayek]], [[Ancient South Arabian script|Old South Arabian]], [[Old Turkic script|Old Turkic]], [[Samaritan script|Samaritan]], [[Tai Tham script|Tai Tham]] and [[Tai Viet script|Tai Viet]], additional CJK Unified Ideographs, Jamo for Old Hangul, [[Vedic Sanskrit]]<!-- --> |- id="6.0" | {{Unicode version|version=6.0}}<ref>{{multiref|{{Cite Unicode|6.1}}|{{cite web|title=Unicode Data 6.0.0|url=https://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt|access-date=2010-10-11}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=6.0|format=month}}}} | {{ISBN|link=no|978-1-936213-01-6}} | ISO/IEC 10646:2010 {{efn|Plus the [[Indian rupee sign]]|group=tablenote}} | 93 | {{val|109,384}}{{su|p={{val|+2,088}}}} | style="text-align:left" | [[Batak script|Batak]], [[Brahmi script|Brahmi]], [[Mandaic alphabet|Mandaic]], [[playing card]] symbols, transport and map symbols, [[alchemical symbol]]s, [[emoticons]] and emoji,<ref>{{Cite web |title=Unicode 6.0 Emoji List |url=https://emojipedia.org/unicode-6.0/|access-date=2022-09-21|website=emojipedia.org}}</ref> additional CJK Unified Ideographs<!-- --> |- id="6.1" | {{Unicode version|version=6.1}}<ref>{{multiref|{{Cite Unicode|6.1}}|{{cite web|title=Unicode Data 6.1.0|url=https://www.unicode.org/Public/6.1.0/ucd/UnicodeData.txt|access-date=2012-01-31}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=6.1|format=month}}}} | {{ISBN|link=no|978-1-936213-02-3}} | rowspan="4" | ISO/IEC 10646:2012 {{efn|{{cslist|semi=yes|6.2 added the [[Turkish lira sign]]|6.3 added five additional characters|7.0 added Amendments 1 and 2 as well as the [[ruble sign]]}}|group=tablenote}} | rowspan="3" | 100 | {{val|110,116}}{{su|p={{val|+732}}}} | style="text-align:left" | [[Chakma script|Chakma]], [[Meroitic script|Meroitic cursive]], [[Meroitic script|Meroitic hieroglyphs]], [[Pollard script|Miao]], [[Sharada script|Sharada]], [[Sorang Sompeng script|Sora Sompeng]], and [[Takri script|Takri]]<!-- --> |- id="6.2" | {{Unicode version|version=6.2}}<ref>{{multiref|{{Cite Unicode|6.2}}|{{cite web|title=Unicode Data 6.2.0|url=https://www.unicode.org/Public/6.2.0/ucd/UnicodeData.txt|access-date=2012-09-26}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=6.2|format=month}}}} | {{ISBN|link=no|978-1-936213-07-8}} | {{val|110,117}}{{su|p={{val|+1}}}} | style="text-align:left" | {{unichar|20BA|TURKISH LIRA SIGN}}<!-- --> |- id="6.3" | {{Unicode version|version=6.3}}<ref>{{multiref|{{Cite Unicode|6.3}}|{{cite web|title=Unicode Data 6.3.0|url=https://www.unicode.org/Public/6.3.0/ucd/UnicodeData.txt|access-date=2013-09-30}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=6.3|format=month}}}} | {{ISBN|link=no|978-1-936213-08-5}} | {{val|110,122}}{{su|p={{val|+5}}}} | style="text-align:left" | 5 bidirectional formatting characters<!-- --> |- id="7.0" | {{Unicode version|version=7.0}}<ref>{{multiref|{{Cite Unicode|7}}|{{cite web|title=Unicode Data 7.0.0|url=https://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt|access-date=2014-06-15}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=7.0 |format=month}}}} | {{ISBN|link=no|978-1-936213-09-2}} | 123 | {{val|112,956}}{{su|p={{val|+2,834}}}} | style="text-align:left" | [[Bassa Vah script|Bassa Vah]], [[Caucasian Albanian script|Caucasian Albanian]], [[Duployan shorthand|Duployan]], [[Elbasan script|Elbasan]], [[Grantha script|Grantha]], [[Khojki script|Khojki]], [[Khudabadi script|Khudawadi]], [[Linear A]], [[Mahajani]], [[Manichaean script|Manichaean]], [[Mende Kikakui script|Mende Kikakui]], [[Modi script|Modi]], [[Mro script|Mro]], [[Nabataean script|Nabataean]], [[Ancient North Arabian|Old North Arabian]], [[Old Permic script|Old Permic]], [[Pahawh Hmong]], [[Palmyrene alphabet|Palmyrene]], [[Pau Cin Hau script|Pau Cin Hau]], [[Psalter Pahlavi]], [[Siddhaṃ script|Siddham]], [[Tirhuta script|Tirhuta]], [[Warang Citi]], and [[dingbat]]s<!-- --> |- id="8.0" | {{Unicode version|version=8.0}}<ref>{{multiref|{{Cite Unicode|8}}|{{cite web|title=Unicode Data 8.0.0|url=https://www.unicode.org/Public/8.0.0/ucd/UnicodeData.txt|access-date=2015-06-17}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=8.0|format=month}}}} | {{ISBN|link=no|978-1-936213-10-8}} | rowspan="2" | ISO/IEC 10646:2014 {{efn|<!--{{cslist|semi=yes|-->Plus Amendment 1, as well as the [[Georgian lari|Lari sign]], nine CJK unified ideographs, and 41 emoji;<ref>{{Cite Unicode|8}}</ref><br>9.0 added Amendment 2, as well as Adlam, Newa, Japanese TV symbols, and 74 emoji and symbols.<ref>{{Cite Unicode|9}}</ref>}} | 129 | {{val|120,672}}{{su|p={{val|+7,716}}}} | style="text-align:left" | [[Ahom script|Ahom]], [[Anatolian hieroglyphs]], [[Hatran alphabet|Hatran]], [[Multani script|Multani]], [[Old Hungarian alphabet|Old Hungarian]], [[SignWriting]], additional CJK Unified Ideographs, lowercase letters for Cherokee, 5 emoji [[Fitzpatrick scale|skin tone modifiers]]<!-- --> |- id="9.0" | {{Unicode version|version=9.0}}<ref>{{multiref|{{Cite Unicode|9}}|{{cite web|title=Unicode Data 9.0.0|url=https://www.unicode.org/Public/9.0.0/ucd/UnicodeData.txt|access-date=2016-06-21}} }}</ref> | {{dts|{{Unicode version/version-to-date|version=9.0|format=month}}}} | {{ISBN |link=no|978-1-936213-13-9}} | 135 | {{val|128,172}}{{su|p={{val|+7,500}}}} | style="text-align:left" | [[Adlam script|Adlam]], [[Bhaiksuki script|Bhaiksuki]], [[Zhang-Zhung language#Scripts|Marchen]], [[Pracalit script|Newa]], [[Osage script|Osage]], [[Tangut script|Tangut]], 72 emoji<ref name=laobo>{{cite web|first=Martim|last=Lobao|url=https://www.androidpolice.com/2016/06/07/two-emoji-werent-approved-unicode-9-google-added-android-anyway/ |title=These Are The Two Emoji That Weren't Approved For Unicode 9 But Which Google Added To Android Anyway|website=Android Police|date= 7 June 2016|access-date=4 September 2016}}</ref><!-- --> |- id="10.0" | {{Unicode version|version=10.0}}<ref name="v10">{{Cite Unicode|10}}</ref> | {{dts|{{Unicode version/version-to-date|version=10.0|format=month}}}} | {{ISBN |link=no|978-1-936213-16-0}} | rowspan="4" | ISO/IEC 10646:2017 {{efn|{{cslist|semi=yes|Plus 56 emoji, 285 [[hentaigana]] characters, and 3 Zanabazar Square characters|11.0 added 46 Mtavruli Georgian capital letters, 5 CJK unified ideographs, and 66 emoji|12.0 added 62 additional characters.}}|group=tablenote}} | 139 | {{val|136,690}}{{su|p={{val|+8,518}}}} | style="text-align:left" | [[Zanabazar square script|Zanabazar Square]], [[Soyombo script|Soyombo]], [[Masaram Gondi script|Masaram Gondi]], [[Nüshu]], [[hentaigana]], 7,494 CJK Unified Ideographs, 56 emoji, [[Bitcoin|{{unichar|20BF}}]]<!-- --> |- id="11.0" | {{Unicode version|version=11.0}}<ref>{{Cite Unicode|11}}</ref> | {{dts|{{Unicode version/version-to-date|version=11.0|format=month}}}} | {{ISBN|link=no|978-1-936213-19-1}} | 146 | {{val|137,374}}{{su|p={{val|+684}}}} | style="text-align:left" | [[Dogri script|Dogra]], [[Georgian scripts#Mkhedruli|Georgian Mtavruli]] capital letters, [[Gunjala Gondi script|Gunjala Gondi]], [[Hanifi Rohingya script|Hanifi Rohingya]], [[Indic Siyaq Numbers]], [[Makassarese language|Makasar]], [[Medefaidrin]], [[Sogdian alphabet|Old Sogdian and Sogdian]], [[Maya numerals]], 5 CJK Unified Ideographs, symbols for [[xiangqi]] and [[Star (classification)|star ratings]], 145 emoji<!-- --> |- id="12.0" | {{Unicode version|version=12.0}}<ref>{{Cite Unicode|12}}</ref> | {{dts|{{Unicode version/version-to-date|version=12.0|format=month}}}} | {{ISBN|link=no|978-1-936213-22-1}} | rowspan=2 | 150 | {{val|137,928}}{{su|p={{val|+554}}}} | style="text-align:left" | [[Elymaic]], [[Nandinagari]], [[Nyiakeng Puachue Hmong]], [[Wancho script|Wancho]], [[Pollard script|Miao script]], hiragana and katakana small letters, Tamil historic fractions and symbols, Lao letters for [[Pali]], Latin letters for Egyptological and Ugaritic transliteration, hieroglyph format controls, 61 emoji<!-- --> |- id="12.1" | {{Unicode version|version=12.1}}<ref>{{Cite web|url=https://blog.unicode.org/2019/05/unicode-12-1-en.html|title=Unicode Version 12.1 released in support of the Reiwa Era|website=The Unicode Blog |access-date=2019-05-07}}</ref> | {{dts|{{Unicode version/version-to-date|version=12.1|format=month}}}} | {{ISBN |link=no|978-1-936213-25-2}} | {{val|137,929}}{{su|p={{val|+1}}}} | style="text-align:left" | [[Reiwa era|{{unichar|32FF|SQUARE ERA NAME REIWA}}]]<!-- --> |- id="13.0" | {{Unicode version|version=13.0}}<ref name="v13.0.0">{{multiref|1={{Cite Unicode|13}}|2={{Cite web|url=https://blog.unicode.org/2020/03/announcing-unicode-standard-version-130.html|title=Announcing The Unicode Standard, Version 13.0|website=The Unicode Blog|access-date=2020-03-11}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=13.0|format=month}}}} | {{ISBN|link=no|978-1-936213-26-9}} | rowspan="4" | ISO/IEC 10646:2020 <ref>{{Cite web|title=The Unicode Standard, Version 13.0– Core Specification Appendix C|url=https://www.unicode.org/versions/Unicode13.0.0/appC.pdf|publisher=Unicode Consortium|access-date=2020-03-11}}</ref> | 154 | {{val|143,859}}{{su|p={{val|+5,930}}}} | style="text-align:left" | [[Khwarezmian language#Writing system|Chorasmian]], [[Dhives Akuru]], [[Khitan small script]], [[Kurdish alphabets#Yezidi|Yezidi]], 4,969 CJK ideographs, Arabic script additions used to write [[Hausa language|Hausa]], [[Wolof language|Wolof]], and other African languages, additions used to write [[Hindko]] and [[Punjabi language|Punjabi]] in Pakistan, Bopomofo additions used for Cantonese, Creative Commons license symbols, graphic characters for compatibility with teletext and home computer systems, 55 emoji<!-- --> |- id="14.0" | {{Unicode version|version=14.0}}<ref name="v14.0.0">{{multiref|1={{Cite Unicode|14}}|2={{cite web|url=https://blog.unicode.org/2021/09/announcing-unicode-standard-version-140.html|title=Announcing The Unicode Standard, Version 14.0}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=14.0|format=month}}}} | {{ISBN|link=no|978-1-936213-29-0}} | 159 | {{val|144,697}}{{su|p={{val|+838}}}} | style="text-align:left" | [[Toto language|Toto]], [[Cypro-Minoan syllabary|Cypro-Minoan]], [[Vithkuqi script|Vithkuqi]], [[Old Uyghur alphabet|Old Uyghur]], [[Tangsa language|Tangsa]], extended IPA, Arabic script additions for use in languages across Africa and in Iran, Pakistan, Malaysia, Indonesia, Java, and Bosnia, additions for honorifics and Quranic use, additions to support languages in North America, the Philippines, India, and Mongolia, [[Kyrgyzstani som|{{unichar|20C0|SOM SIGN}}]], [[Znamenny chant|Znamenny]] musical notation, 37 emoji<!-- --> |- id="15.0" | {{Unicode version|version=15.0}}<ref name="v15.0.0">{{Cite Unicode|15.0}}</ref> | {{dts|{{Unicode version/version-to-date|version=15.0|format=month}}}} | {{ISBN|link=no|978-1-936213-32-0}} | rowspan="2" | 161 | {{val|149,186}}{{su|p={{val|+4,489}}}} | style="text-align:left" | [[Kawi script|Kawi]] and [[Mundari Bani|Mundari]], 20 emoji, 4,192 CJK ideographs, control characters for Egyptian hieroglyphs<!-- --> |- id="15.1" | {{Unicode version|version=15.1}}<ref name="v15.1.0">{{multiref|1={{Cite Unicode|15.1}}}}</ref> | {{dts|{{Unicode version/version-to-date|version=15.1|format=month}}}} | {{ISBN|link=no|978-1-936213-33-7}} | {{val|149,813}}{{su|p={{val|+627}}}} | style="text-align:left" | Additional CJK ideographs<!-- --> |- id="16.0" | {{Unicode version|version=16.0}}<ref name="v16.0.0">{{Cite Unicode|16.0}}</ref> | {{dts|{{Unicode version/version-to-date|version=16.0|format=month}}}} | {{ISBN|link=no|978-1-936213-34-4}} | | 168 | {{val|154,998}}{{su|p={{val|+5185}}}} | style="text-align:left" | [[Garay alphabet|Garay]], [[Khema script|Gurung Khema]], [[Kirat Rai]], [[Ol Onal]], [[Sunuwar alphabet|Sunuwar]], [[Todhri alphabet|Todhri]], [[Tigalari script|Tulu-Tigalari]]<!-- --> |} {{notelist|group=tablenote}} === Projected versions === The Unicode Consortium normally releases a new version of ''The Unicode Standard'' once a year. Version 17.0, the next major version, is projected to include 4301 new unified [[CJK characters]], CJK Unified Ideographs Extension J.<ref>{{Cite web|url=https://unicode.org/alloc/Pipeline.html|title=Proposed New Characters: The Pipeline|date=September 10, 2024|website=Unicode|accessdate=September 13, 2024}}</ref><ref>{{Cite web|url=https://emojipedia.org/unicode-16.0|title=Unicode Version 16.0|website=emojipedia.org|accessdate=September 13, 2023}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)