Editing Unicode (section)

==== Han unification ====
{{Main|Han unification}}

The [[Ideographic Research Group]] (IRG) is tasked with advising the Consortium and ISO regarding Han unification, or Unihan, especially the further addition of CJK unified and compatibility ideographs to the repertoire. The IRG is composed of experts from each region that has historically used [[Chinese characters]]. However, despite the deliberation within the committee, Han unification has consistently been one of the most contested aspects of ''The Unicode Standard'' since the genesis of the project.<ref>[http://tronweb.super-nova.co.jp/characcodehist.html A Brief History of Character Codes], Steven J. Searle, originally written [https://web.archive.org/web/20001216022100/http://tronweb.super-nova.co.jp/characcodehist.html 1999], last updated 2004</ref>

Existing character set standards such as the Japanese [[JIS X 0208]] (encoded by [[Shift JIS]]) defined unification criteria, meaning rules for determining when a [[variant Chinese character]] is to be considered a handwriting/font difference (and thus unified), versus a spelling difference (to be encoded separately). Unicode's character model for CJK characters was based on the unification criteria used by JIS X 0208, as well as those developed by the Association for a Common Chinese Code in China.<ref name="tus-appe">{{cite web |url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/appendix-e/ |title=Appendix E: Han Unification History |work=The Unicode Standard Version 16.0 – Core Specification |publisher=[[Unicode Consortium]] |date=2024}}</ref>

Due to the standard's principle of encoding semantic instead of stylistic variants, Unicode has received criticism for not assigning code points to certain rare and archaic [[kanji]] variants, possibly complicating processing of ancient and uncommon Japanese names. Since it places particular emphasis on Chinese, Japanese and Korean sharing many characters in common, Han unification is also sometimes perceived as treating the three as the same thing.<ref name="dw2001">{{Cite web |last = Topping |first = Suzanne |date=2013-06-25 |title=The secret life of Unicode |website=[[IBM]] |url=https://www.ibm.com/developerworks/library/u-secret.html |access-date=20 March 2023 |archive-url=https://web.archive.org/web/20130625062705/http://www.ibm.com/developerworks/library/u-secret.html |archive-date=25 June 2013 }}</ref> Regional differences in the expected forms of characters, in terms of typographical conventions and curricula for handwriting, do not always fall along language boundaries: although [[Hong Kong]] and [[Taiwan]] both write [[Chinese languages]] using [[Traditional Chinese]] characters, the preferred forms of characters differ between Hong Kong and Taiwan in some cases.<ref name="irgn2074">{{cite web |url=https://www.unicode.org/irg/docs/n2074-HKCS.pdf |id=[[ISO/IEC JTC 1|ISO/IEC JTC1]]/[[ISO/IEC JTC 1/SC 2|SC2]]/WG2/[[Ideographic Research Group|IRG]] N2074 |last=Lu |first=Qin |title=The Proposed Hong Kong Character Set |date=2015-06-08}}</ref>

Less-frequently-used alternative encodings exist, often predating Unicode, with character models differing from this paradigm, aimed at preserving the various stylistic differences between regional and/or nonstandard character forms. One example is the [[TRON (encoding)|TRON Code]] favored by some users for handling historical Japanese text, though not widely adopted among the Japanese public. Another is the [[CCCII]] encoding adopted by library systems in [[Hong Kong]], [[Taiwan]] and the [[United States]]. These have their own drawbacks in general use, leading to the [[Big5]] encoding (introduced in 1984, four years after CCCII) having become more common than CCCII outside of library systems.<ref name="hanazono">{{cite web |url=http://kura.hanazono.ac.jp/paper/codes.html |archive-url=https://web.archive.org/web/20041012135645/http://kura.hanazono.ac.jp/paper/codes.html |archive-date=2004-10-12 |url-status=dead |title=Chinese character codes: an update |first=Christian |last=Wittern |date=1995-05-01 |publisher=International Research Institute for Zen Buddhism / [[Hanazono University]]}}</ref> Although work at [[Apple Computer|Apple]] based on [[Research Libraries Group]]'s CJK Thesaurus, which was used to maintain the EACC variant of CCCII, was one of the direct predecessors of Unicode's [[Unihan]] set, Unicode adopted the JIS-style unification model.<ref name="tus-appe"/>

The earliest version of Unicode had a repertoire of fewer than 21,000 Han characters, largely limited to those in relatively common modern usage. As of version 16.0, the standard now encodes more than 97,000 Han characters, and work is continuing to add thousands more—largely historical and dialectal variant characters used throughout the [[Sinosphere]].

Modern typefaces provide a means to address some of the practical issues in depicting unified Han characters with various regional graphical representations. The 'locl' [[OpenType]] table allows a renderer to select a different glyph for each code point based on the text locale.<ref>{{Cite web |date=18 February 2023 |title=Noto CJK fonts |url=https://github.com/notofonts/noto-cjk/blob/main/Serif/README.md |publisher=Noto Fonts |quote=Select this deployment format if your system supports variable fonts and you prefer to use only one language, but also want full character coverage or the ability to language-tag text to use glyphs that are appropriate for the other languages (this requires an app that supports language tagging and the OpenType 'locl' GSUB feature).}}</ref> The [[variation Selectors|Unicode variation sequences]] can also provide in-text annotations for a desired glyph selection; this requires registration of the specific variant in the [[Ideographic Variation Database]].