Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extended Unix Code
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==EUC-CN== {{Infobox character encoding | name = EUC-CN | image = EUCCN_encoding.svg | mime = GB2312 | alias = csGB2312, CN-GB{{Ref RFC|1922|section=2.1: CN-GB)}} | standard = GB 2312 (1980) | lang = [[Simplified Chinese]], [[English language|English]], [[Russian language|Russian]] | extends = [[ASCII]] | extensions = 748, [[GBK (character encoding)|GBK]], {{nowrap|[[GB 18030]]}}, x-mac-chinesesimp | encodes = [[GB 2312]] | status = | prev = | next = [[GBK (character encoding)|GBK]], {{nowrap|[[GB 18030]]}} | classification = [[Extended ASCII]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-CN'''<ref name="macsimchinese" /> is the usual encoded form of the {{nowrap|[[GB 2312]]}} standard for [[simplified Chinese characters]]. Unlike the case of Japanese [[JIS X 0208]] and [[ISO-2022-JP]], {{nowrap|GB 2312}} is not normally used in a 7-bit {{nowrap|ISO 2022}} code version,{{efn|7-bit ISO 2022 code versions supporting {{nowrap|GB 2312}} include [[ISO-2022-CN]] (with shift codes) and [[ISO/IEC 2022#ISO-2022-JP-2|ISO-2022-JP-2]] (without shift codes), both of which also support other non-ASCII sets.}} although a variant form called [[HZ (character encoding)|HZ]] (which delimits {{nowrap|GB 2312}} text with ASCII sequences) was sometimes used on [[USENET]]. An ASCII character is represented in its usual encoding. A character from {{nowrap|GB 2312}} is represented by two bytes, both from the range 0xA1–0xFE. ===748 code=== An encoding related to EUC-CN is the "748" code used in the WITS typesetting system developed by Beijing's Founder Technology (now obsoleted by its newer FITS typesetting system). The 748 code contains all of {{nowrap|[[GB 2312]]}}, but is not {{nowrap|ISO 2022}}–compliant and therefore not a true EUC code. (It uses an 8-bit lead byte but distinguishes between a second byte with its most significant bit set and one with its most significant bit cleared, and is, therefore, more similar in structure to [[Big5]] and other non–ISO 2022–compliant [[double-byte character set|DBCS]] encoding systems.) The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting. ===IBM code pages 1380, 1381, 1382 and 1383=== [[IBM]] code page 1381 ([[CCSID]] 1381) comprises the single-byte [[code page 1115]] (CPGID 1115 as CCSID 1115) and the double-byte code page 1380 (CPGID 1380 as CCSID 1380),<ref>{{cite web |archive-url=https://web.archive.org/web/20160326215337/http://www-01.ibm.com/software/globalization/ccsid/ccsid1381.html |archive-date=2016-03-26 |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1381.html |url-status=dead |publisher=[[IBM]] |title= S-Ch PC Data mixed (IBM GB) including 1880 UDC, 31 IBM selected characters and 5 SAA SB characters |work=IBM Globalization: Coded character set identifiers}}</ref> which encodes GB 2312 the same way as EUC-CN, but deviates from the EUC structure by extending the lead byte range back to 0x8C, adding 31 IBM-selected characters in 0x8CE0 through 0x8CFE and adding 1880 [[Private Use Areas#Private-use characters in other character sets|user-defined characters]] with lead bytes 0x8D through 0xA0.<ref>{{cite web |url=https://public.dhe.ibm.com/as400/products/clientaccess/win32/files/globalization/S_Chinese_base1993.pdf |title=IBM Simplified Chinese Graphic Character Set |id=C-H 3-3220-130 1993-11 |date=1993 |publisher=[[IBM]]}}</ref> IBM code page 1383 (CCSID 1383) comprises the single-byte [[ASCII|code page 367]] and the double-byte code page 1382 (CPGID 1382 as CCSID 1382),<ref>{{cite web |archive-url=https://web.archive.org/web/20160328020818/http://www-01.ibm.com/software/globalization/ccsid/ccsid1383.html |archive-date=2016-03-28 |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1383.html |url-status=dead |publisher=[[IBM]] |title=CCSID 1383: S-Ch EUC G0 set, ASCII G1 set, GB 2312-80 set (1382) |work=IBM Globalization: Coded character set identifiers}}</ref> which differs by conforming to the EUC structure, adding the 31 IBM-selected characters in 0xFEE0 through 0xFEFE instead, and including only 1360 user-defined characters, interspersed in the positions not used by GB 2312.<ref>{{cite web |url=https://public.dhe.ibm.com/as400/products/clientaccess/win32/files/globalization/S_Chinese_EUC.pdf |title=IBM Simplified Chinese Graphic Character Set for Extended UNIX Code (EUC) |id=C-H 3-3220-132 1994-06 |date=1994 |publisher=[[IBM]]}}</ref> The alternative CCSID 5479<ref>{{cite web |archive-url=https://web.archive.org/web/20160327022059/http://www-01.ibm.com/software/globalization/ccsid/ccsid5479.html |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid5479.html |archive-date=2016-03-27 |url-status=dead |publisher=[[IBM]] |title=CCSID 5479: S-Ch EUC G0 set, ASCII G1 set, GB 2312-80 set (5478) |work=IBM Globalization: Coded character set identifiers}}</ref> is used for the pure EUC-CN code page: it uses CCSID 9574 as its double-byte set, which uses CPGID 1382 but excludes the IBM-selected and user-defined characters.<ref>{{cite web |archive-url=https://web.archive.org/web/20160327042331/http://www-01.ibm.com/software/globalization/ccsid/ccsid9574.html |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid9574.html |archive-date=2016-03-27 |url-status=dead |publisher=[[IBM]] |title=CCSID 9574: S-Ch DBCS PC GB 2312-80 set, excluding 31 IBM selected and 1360 UDC. Also used in T-Ch 2022-CN TCP. |work=IBM Globalization: Coded character set identifiers}}</ref> ===GBK and GB 18030=== {{Main|GBK (character encoding)|GB 18030}} [[GBK (character encoding)|GBK]] is an extension to {{nowrap|GB 2312}}. It defines an extended form of the EUC-CN encoding capable of representing a larger array of [[CJK characters]] sourced largely from {{nowrap|[[Unicode]] 1.1}}, including [[traditional Chinese characters]] and characters used only in [[Japanese language|Japanese]]. It is not, however, a true EUC code, because ASCII bytes may appear as trail bytes (and [[C0 and C1 control codes#C1|C1 bytes]], not limited to the single shifts, may appear as lead or trail bytes), due to a larger encoding space being required. Variants of GBK are implemented by [[Code page 936 (Microsoft Windows)|Windows code page 936]] (the [[Microsoft Windows]] [[Windows code page|code page]] for simplified Chinese), and by IBM's code page 1386. The Unicode-based {{nowrap|[[GB 18030]]}} character encoding defines an extension of GBK capable of encoding the entirety of [[Unicode]]. However, Unicode encoded as {{nowrap|GB 18030}} is a [[variable-width encoding|variable-length encoding]] which may use up to four bytes per character, due to an even larger encoding space being required. Being an extension of GBK, it is a superset of EUC-CN but is not itself a true EUC code. Being a Unicode encoding, its repertoire is identical to that of other [[Unicode transformation format]]s such as [[UTF-8]]. ==={{anchor|MacChineseSimp|x-mac-chinesesimp}}Mac OS Chinese Simplified=== Other EUC-CN variants deviating from the EUC mechanism include the [[classic Mac OS]] Chinese Simplified script (known as Code page 10008 or <code>x-mac-chinesesimp</code>).<ref name="msdnlabels">{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property – .NET Framework (current version) |work=MSDN |publisher=Microsoft}}</ref> It uses the bytes 0x80, 0x81, 0x82, 0xA0, 0xFD, 0xFE, and 0xFF for the [[ü|U with umlaut]] (ü), two special font metric characters, the [[non-breaking space]], the [[copyright sign]] (©), the [[trademark sign]] (™) and the ellipsis (...) respectively.<ref name="macsimchinese">{{cite web|url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT|title=Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later.|publisher=[[Apple, Inc]]}}<!-- Note: The comment blurb at the start of the file gives 0xFC and 0xFD as the © and ™ one-byte codes, contradicting its (the blurb's) statement that 0xA1–0xFC are the double-byte lead bytes. The actual (more authoritative) mapping data in the file lists 0xFD and 0xFE as the © and ™ one-byte codes. --></ref> This differs in what is regarded as a single-byte character versus the first byte of a two-byte character from both EUC (where, of those, 0xFD and 0xFE are defined as lead bytes) and GBK (where, of those, 0x81, 0x82, 0xFD and 0xFE are defined as lead bytes). This use of 0xA0, 0xFD, 0xFE and 0xFF matches [[MacJapanese|Apple's Shift_JIS variant]]. Besides these changes to the lead byte range, the other distinctive feature of the double-byte portion of Mac OS Chinese Simplified is the inclusion of two extensions to the basic GB 2312-80 set in rows 6 and 8.<ref name="macsimchinese" /> These are considered "standard extensions to GB 2312", neither of which is proprietary to Apple: the row 8 extension was taken from [[GB 6345.1]],<ref name="macsimchinese" /> both extensions are included by [[GB/T 12345]] (the traditional Chinese variant of GB 2312),<ref>{{cite book |title=Appendix F: GB/T 12345 |last=Lunde |first=Ken |author-link=Ken Lunde |chapter=CJKV Information Processing |isbn=9781565922242 |year=1998 |url=https://resources.oreilly.com/examples/9781565922242/blob/master/AppF/gbt12345.pdf |publisher=[[O'Reilly Media]]}}</ref> and both extensions are included by [[GB 18030]] (the successor to GB 2312).<ref>{{Cite book|url=https://archive.org/details/GB18030-2005|title=GB 18030-2005: Information Technology—Chinese coded character set|last=Standardization Administration of China (SAC)|date=2005-11-18}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)