Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Chinese character encoding
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Representation of CJK characters on computers}} {{more citations needed|date=March 2016}} In computing, '''Chinese character encodings''' can be used to represent text written in the [[CJK characters|CJK]] languages—[[Chinese language|Chinese]], [[Japanese language|Japanese]], [[Korean language|Korean]]—and (rarely) obsolete [[Chữ Nôm|Vietnamese]], all of which use [[Chinese character]]s. Several general-purpose [[character encoding]]s accommodate Chinese characters, and some of them were developed specifically for Chinese. In addition to [[Unicode]] (with the set of [[CJK Unified Ideographs]]), local encoding systems exist. The Chinese [[Guobiao code|Guobiao]] (or GB, "national standard") system is used in [[mainland China]] and [[Singapore]], and the (mainly) Taiwanese [[Big5]] system is used in [[Taiwan]], [[Hong Kong]] and [[Macau]] as the two primary "legacy" local encoding systems. Guobiao is usually displayed using [[Simplified Chinese character|simplified characters]] and Big5 is usually displayed using [[traditional Chinese characters|traditional characters]]. There is however no mandated connection between the encoding system and the font used to display the characters; font and encoding are usually tied together for practical reasons. The issue of which encoding to use can also have political implications, as GB is the official standard of the [[China|People's Republic of China]] and Big5 is a ''[[de facto]]'' standard of [[Taiwan]]. In contrast to the [[Han unification|situation with Japanese]], there has been relatively little overt opposition to Unicode, which solves many of the issues involved with GB and Big5. Unicode is widely regarded as politically neutral, has good support for both simplified and traditional characters, and can be easily converted to and from the GB and Big5. Furthermore, Unicode has the advantage of not being limited only to Chinese, since it contains character codes for (nearly) every language. == Guobiao == {{Main|GB 2312|GBK (character encoding)|GB 18030|Code page 1386}} The Guobiao (GB) line of character encodings start with the [[Simplified Chinese]] charset [[GB 2312]] published in 1980. Two encoding schemes existed for GB 2312: a one-or-two byte 8-bit [[EUC-CN]] encoding commonly used, and a 7-bit encoding called [[HZ (character encoding)|HZ]]<ref>{{IETF RFC|1843}}</ref> for usenet posts.<ref name="cjkv-info-proc">{{cite book|last1=Lunde|first1=Ken|title=CJKV Information Processing|date=December 2008|publisher=O'Reilly Media, Inc|isbn=978-0-596-51447-1|url=https://books.google.com/books?id=SA92uQqTB-AC|accessdate=11 September 2016}}</ref>{{rp|94}} A traditional variant called [[GB/T 12345]] was published in 1990. The EUC-CN form was later extended into [[GBK (character encoding)|GBK]] to include ''all'' Unicode 1.1 CJK Ideographs in 1993, abandoning the ISO-2022 model. By doing so, GBK includes [[traditional Chinese characters]] in addition to simplified ones in GB2312.<ref>{{Cite web|url=http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html |title=GB18030-2000 – The New Chinese National Standard – GB 18030 |date=2012-08-25 |access-date=2016-10-13 |url-status=bot: unknown |archiveurl=https://web.archive.org/web/20120825155118/http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html |archivedate=2012-08-25 }}</ref> GBK gained popularity through the widespread [[Code page 1386|Code page 936]] implementation found in Microsoft Windows 95. In 2000, [[GB 18030]] was published as GBK's successor. This new encoding includes a four-byte UTF which encodes all Unicode codepoints not previously encoded.<ref>[http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml Authoritative mapping table between GB18030-2000 and Unicode]. ICU – International Components for Unicode. 2001-02-21. Accessed 2016-10-13.</ref> In 2005, [[GB 18030]] was published to contain reference glyphs for scripts used by [[ethnic minorities in China]], as well as glyphs from [[CJK Unified Ideographs]] Extension B due to the update of [[Unicode]]. [[PostScript fonts#Adobe-GB1|Adobe-GB1]] is the corresponding PostScript charset for GB encodings. == Big5 == {{main|Big5|Big5-HKSCS}}The Big5 family of character encodings start with the initial definition by the consortium of five companies in Taiwan that developed it.<ref>{{Cite web|url=http://chinesemac.org/pages/character_sets.html|title=[chinese mac] Character Sets|website=chinesemac.org|access-date=2016-10-13}}</ref> It is a [[double-byte character set|double-byte character set (DBCS)]] somehow similar to [[Shift JIS]], often combined with a MBCS like [[ASCII]]. Quite a few vendors as well as official extensions exist, of which ETEN, [[HKSCS]] (Hong Kong) and Big5-2003 (as a part of [[CNS 11643]] by Taiwan) are the most well-known ones.<ref>{{Cite web|url=http://moztw.org/docs/big5/|title=Big5 Variants in Mozilla: Mozilla 系列與 Big5 中文字碼|website=moztw.org|access-date=2016-10-13}}</ref> [[PostScript fonts#Adobe-CNS1|Adobe-CNS1]] is the PostScript charset corresponding to the Big5 family of encodings. == Conversion == Prior to [[GBK (character encoding)|GBK]] which includes both traditional and simplified characters, conversion between Traditional Chinese and Simplified Chinese charsets was complicated by the need of transcribing text between the two variants of Chinese, as one charset cover many of the other's characters only in its own variant. The conversion between traditional and simplified Chinese is usually problematic, because the simplification of some traditional forms merged two or more different characters into one simplified form. The traditional to simplified (many-to-one) conversion is technically simple. The opposite conversion often results in a data loss when converting to [[GB 2312]]: in mapping one-to-many when assigning traditional glyphs to the simplified glyphs, some characters will inevitably be the wrong choices in some of the usages. Thus simplified to traditional conversion often requires usage context or common phrase lists to resolve conflicts. This issue is less of a problem with newer standards such as GBK, GB 18030 and Unicode, which have separate code points for both simplified and traditional characters. {{citation needed|reason=Doesn't it still need conversion?|date=April 2018}} One other issue is that many of the encoding systems are missing characters. While the missing characters are often literary and not commonly used in ordinary text, this does become a problem because people's names often contain these characters. An example of the problem is the Taiwanese politician [[Wang Chien-shien]] who has a {{transliteration|zh|pinyin|xuān}} ({{lang|zh|煊}}) character in his name which is not in some character systems, and former Chinese premier [[Zhu Rongji]], whose {{transliteration|zh|pinyin|róng}} ({{lang|zh|镕}}) character is not in GB 2312. The newest GB standard, GB 18030 has the complete character repertoire of Unicode 4.0, including the [[Unihan]] extensions in the [[Supplementary Ideographic Plane]].<ref name="cjkv-info-proc"/>{{rp|105}} ==See also== * [[Chinese input methods for computers]] * [[Han unification]] * [[Four corner method]] * [[Chinese character information technology]] == References == {{reflist}} ==Further reading== * {{cite book|last1=Lunde|first1=Ken|title=CJKV Information Processing|edition=2nd|publisher=O'Reilly|date=2009|isbn=9780596514471|chapter=Chinese Character Set Standards—China|chapter-url=https://books.google.com/books?id=SA92uQqTB-AC&pg=PA94}} == External links == * [http://www.mandarintools.com/zhcode.html Chinese Encoding Converter] * [http://demo.icu-project.org/icu-bin/convexp?s=ALL ICU's Converter Explorer] * [https://web.archive.org/web/20160303230643/http://cs.nyu.edu/~yusuke/tools/unicode_to_gb2312_or_gbk_table.html Unicode to GB2312 or GBK table] * [http://www.khngai.com/chinese/charmap/index.php Chinese Character Codes] * [https://web.archive.org/web/20120825155118/http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html Evolution of GBK and GB2312 into GB18030] * [http://www.herongyang.com/Unicode/index.html Unicode Tutorials – Herong's Tutorial Examples] {{CJK_computing}} [[Category:Korean language]] [[Category:Chinese character encodings| ]] [[Category:Encodings of Asian languages]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:CJK computing
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:IETF RFC
(
edit
)
Template:Lang
(
edit
)
Template:Main
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Rp
(
edit
)
Template:Short description
(
edit
)
Template:Transliteration
(
edit
)