Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extended Unix Code
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|System of East Asian character encodings}} {{Technical|date=March 2023}} '''Extended Unix Code''' ('''EUC''') is a multibyte [[character encoding]] system used primarily for [[Japanese language|Japanese]], [[Korean language|Korean]], and [[simplified Chinese characters|simplified Chinese (characters)]]. The most commonly used EUC codes are [[variable-width encoding|variable-length encodings]] with a character belonging to an {{nowrap|[[ISO/IEC 646]]}} compliant coded character set (such as [[ASCII]]) taking one byte, and a character belonging to a 94Γ94 coded character set (such as {{nowrap|[[GB 2312]]}}) represented in two bytes. The [[EUC-CN]] form of {{nowrap|GB 2312}} and [[EUC-KR]] are examples of such two-byte EUC codes. [[EUC-JP]] includes characters represented by up to three bytes, including an initial {{ctrl|SS2|shift code}}, whereas a single character in [[EUC-TW]] can take up to four bytes. Modern applications are more likely to use [[UTF-8]], which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially [[EUC-KR]] for South Korea.<!-- North Korea has 100% UTF-8 use on the world-facing web; how much [[KPS 9566]] is used on the country-internal network is a matter of speculation. --> ==Encoding structure== [[File:Ecma43 versus EUC.svg|thumb|right|Relationship between packed EUC and other 8-bit {{nowrap|ISO 2022}} profiles]] The structure of EUC is based on the {{nowrap|[[ISO/IEC 2022]]}} standard, which specifies a system of graphical character sets that can be represented with a sequence of the 94 7-bit bytes [[hexadecimal|0x]]21β7E, or alternatively 0xA1βFE if an eighth bit is available. This allows for sets of 94 graphical characters, or 8836 (94<sup>2</sup>) characters, or 830584 (94<sup>3</sup>) characters. Although initially 0x20 and 0x7F were always the [[space character|space]] and {{ctrl|DEL|delete character}} and 0xA0 and 0xFF were unused, later editions of {{nowrap|ISO/IEC 2022}} allowed the use of the bytes 0xA0 and 0xFF (or 0x20 and 0x7F) within sets under certain circumstances, allowing the inclusion of 96-character sets. The ranges 0x00β1F and 0x80β9F are used for [[C0 and C1 control codes]]. EUC is a family of 8-bit profiles of {{nowrap|ISO/IEC 2022}}, as opposed to 7-bit profiles such as [[ISO-2022-JP]]. As such, only {{nowrap|ISO 2022}} compliant character sets can have EUC forms. Up to four coded character sets (referred to as G0, G1, G2, and G3 or as code sets 0, 1, 2, and 3) can be represented with the EUC scheme. The G0 set is set to an {{nowrap|[[ISO/IEC 646]]}} compliant coded character set such as [[ASCII]], {{nowrap|ISO 646:KR}} ({{nowrap|KS X 1003}}) or {{nowrap|[[JISCII|ISO 646:JP]]}} (the lower half of {{nowrap|JIS X 0201}}) and invoked over GL (i.e. 0x21β0x7E, with the most significant bit cleared).<ref name="cdra" /> If ASCII is used, this makes the code an [[extended ASCII]] encoding; the most common deviation from ASCII is that 0x5C ([[backslash]] in ASCII) is often used to represent a [[yen sign]] in EUC-JP (see below) and a [[won sign]] in EUC-KR. The other code sets are invoked over GR (i.e. with the most significant bit set). Hence, to get the EUC form of a character, the most significant bit of each coding byte is set (equivalent to adding 128 to each 7-bit coding byte, or adding 160 to each number in the [[kuten]] code); this allows the software to easily distinguish whether a particular byte in a [[character string]] belongs to the {{nowrap|ISO 646}} code or the extended code. Characters in code sets 2 and 3 are prefixed with the control codes {{ctrl|SS2}} (0x8E) and {{ctrl|SS3}} (0x8F) respectively, and invoked over GR. Besides the initial shift code, any byte outside of the range 0xA0β0xFF appearing in a character from code sets 1 through 3 is not a valid EUC code.<ref name="cdra" /> The EUC code itself does not make use of the announcement and designation sequences from {{nowrap|ISO 2022}}.<ref name="cdra" /> However, the code specification is equivalent to the following sequence of four {{nowrap|ISO 2022}} announcement sequences, with meanings breaking down as follows.<ref name="cdra">{{cite web |url=https://www.ibm.com/downloads/cas/G01BQVRV#page=157 |pages=157β162 |title=Character Data Representation Architecture (CDRA) |author=IBM |website=[[IBM]] |author-link=IBM}}</ref> {|class=wikitable |- !Individual sequence!!Hexadecimal!!Feature of EUC denoted |- |<code>ESC SP C</code>||<code>1B 20 43</code>||ISO-8 (8-bit, G0 in GL, G1 in GR) |- |<code>ESC SP Z</code>||<code>1B 20 5A</code>||G2 accessed using SS2 |- |<code>ESC SP [</code>||<code>1B 20 5B</code>||G3 accessed using SS3 |- |<code>ESC SP \</code>||<code>1B 20 5C</code>||Single-shifts invoke over GR |} ===Fixed-length format=== [[File:CsEucFixWidJapanese.svg|right|thumb|Layout of the fixed-length format for Japanese]] The ISO-2022-based [[variable-width encoding|variable-length encoding]] described above is sometimes referred to as the ''EUC packed format'', which is the encoding format usually labeled as EUC. However, internal processing of EUC data may make use of a fixed-length transformation format called the '''EUC complete two-byte format'''. This represents:<ref name="lunde" /> * Code set 0 as two bytes in the range 0x21β0x7E (except that the first may be 0x00). * Code set 1 as two bytes in the range 0xA0β0xFF (except that the first may be 0x80). * Code set 2 as a byte in the range 0x21β0x7E (or 0x00) followed by a byte in the range 0xA0β0xFF. * Code set 3 as a byte in the range 0xA0β0xFF (or 0x80) followed by a byte in the range 0x21β0x7E. Initial bytes of 0x00 and 0x80 are used in cases where the code set uses only one byte. There is also a four-byte fixed-length format.<ref name="lunde" /> These fixed-length encoding formats are suited to internal processing and are not usually encountered in interchange. EUC-JP is registered with the IANA in both formats, the packed format as "EUC-JP" or "csEUCPkdFmtJapanese" and the fixed width format as "csEUCFixWidJapanese".<ref>{{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets}}</ref> Only the packed format is included in the [[WHATWG]] Encoding Standard used by [[HTML5]].<ref>{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard}}</ref> ==EUC-CN== {{Infobox character encoding | name = EUC-CN | image = EUCCN_encoding.svg | mime = GB2312 | alias = csGB2312, CN-GB{{Ref RFC|1922|section=2.1: CN-GB)}} | standard = GB 2312 (1980) | lang = [[Simplified Chinese]], [[English language|English]], [[Russian language|Russian]] | extends = [[ASCII]] | extensions = 748, [[GBK (character encoding)|GBK]], {{nowrap|[[GB 18030]]}}, x-mac-chinesesimp | encodes = [[GB 2312]] | status = | prev = | next = [[GBK (character encoding)|GBK]], {{nowrap|[[GB 18030]]}} | classification = [[Extended ASCII]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-CN'''<ref name="macsimchinese" /> is the usual encoded form of the {{nowrap|[[GB 2312]]}} standard for [[simplified Chinese characters]]. Unlike the case of Japanese [[JIS X 0208]] and [[ISO-2022-JP]], {{nowrap|GB 2312}} is not normally used in a 7-bit {{nowrap|ISO 2022}} code version,{{efn|7-bit ISO 2022 code versions supporting {{nowrap|GB 2312}} include [[ISO-2022-CN]] (with shift codes) and [[ISO/IEC 2022#ISO-2022-JP-2|ISO-2022-JP-2]] (without shift codes), both of which also support other non-ASCII sets.}} although a variant form called [[HZ (character encoding)|HZ]] (which delimits {{nowrap|GB 2312}} text with ASCII sequences) was sometimes used on [[USENET]]. An ASCII character is represented in its usual encoding. A character from {{nowrap|GB 2312}} is represented by two bytes, both from the range 0xA1β0xFE. ===748 code=== An encoding related to EUC-CN is the "748" code used in the WITS typesetting system developed by Beijing's Founder Technology (now obsoleted by its newer FITS typesetting system). The 748 code contains all of {{nowrap|[[GB 2312]]}}, but is not {{nowrap|ISO 2022}}–compliant and therefore not a true EUC code. (It uses an 8-bit lead byte but distinguishes between a second byte with its most significant bit set and one with its most significant bit cleared, and is, therefore, more similar in structure to [[Big5]] and other non–ISO 2022–compliant [[double-byte character set|DBCS]] encoding systems.) The non-GB2312 portion of the 748 code contains traditional and Hong Kong characters and other glyphs used in newspaper typesetting. ===IBM code pages 1380, 1381, 1382 and 1383=== [[IBM]] code page 1381 ([[CCSID]] 1381) comprises the single-byte [[code page 1115]] (CPGID 1115 as CCSID 1115) and the double-byte code page 1380 (CPGID 1380 as CCSID 1380),<ref>{{cite web |archive-url=https://web.archive.org/web/20160326215337/http://www-01.ibm.com/software/globalization/ccsid/ccsid1381.html |archive-date=2016-03-26 |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1381.html |url-status=dead |publisher=[[IBM]] |title= S-Ch PC Data mixed (IBM GB) including 1880 UDC, 31 IBM selected characters and 5 SAA SB characters |work=IBM Globalization: Coded character set identifiers}}</ref> which encodes GB 2312 the same way as EUC-CN, but deviates from the EUC structure by extending the lead byte range back to 0x8C, adding 31 IBM-selected characters in 0x8CE0 through 0x8CFE and adding 1880 [[Private Use Areas#Private-use characters in other character sets|user-defined characters]] with lead bytes 0x8D through 0xA0.<ref>{{cite web |url=https://public.dhe.ibm.com/as400/products/clientaccess/win32/files/globalization/S_Chinese_base1993.pdf |title=IBM Simplified Chinese Graphic Character Set |id=C-H 3-3220-130 1993-11 |date=1993 |publisher=[[IBM]]}}</ref> IBM code page 1383 (CCSID 1383) comprises the single-byte [[ASCII|code page 367]] and the double-byte code page 1382 (CPGID 1382 as CCSID 1382),<ref>{{cite web |archive-url=https://web.archive.org/web/20160328020818/http://www-01.ibm.com/software/globalization/ccsid/ccsid1383.html |archive-date=2016-03-28 |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1383.html |url-status=dead |publisher=[[IBM]] |title=CCSID 1383: S-Ch EUC G0 set, ASCII G1 set, GB 2312-80 set (1382) |work=IBM Globalization: Coded character set identifiers}}</ref> which differs by conforming to the EUC structure, adding the 31 IBM-selected characters in 0xFEE0 through 0xFEFE instead, and including only 1360 user-defined characters, interspersed in the positions not used by GB 2312.<ref>{{cite web |url=https://public.dhe.ibm.com/as400/products/clientaccess/win32/files/globalization/S_Chinese_EUC.pdf |title=IBM Simplified Chinese Graphic Character Set for Extended UNIX Code (EUC) |id=C-H 3-3220-132 1994-06 |date=1994 |publisher=[[IBM]]}}</ref> The alternative CCSID 5479<ref>{{cite web |archive-url=https://web.archive.org/web/20160327022059/http://www-01.ibm.com/software/globalization/ccsid/ccsid5479.html |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid5479.html |archive-date=2016-03-27 |url-status=dead |publisher=[[IBM]] |title=CCSID 5479: S-Ch EUC G0 set, ASCII G1 set, GB 2312-80 set (5478) |work=IBM Globalization: Coded character set identifiers}}</ref> is used for the pure EUC-CN code page: it uses CCSID 9574 as its double-byte set, which uses CPGID 1382 but excludes the IBM-selected and user-defined characters.<ref>{{cite web |archive-url=https://web.archive.org/web/20160327042331/http://www-01.ibm.com/software/globalization/ccsid/ccsid9574.html |url=https://www-01.ibm.com/software/globalization/ccsid/ccsid9574.html |archive-date=2016-03-27 |url-status=dead |publisher=[[IBM]] |title=CCSID 9574: S-Ch DBCS PC GB 2312-80 set, excluding 31 IBM selected and 1360 UDC. Also used in T-Ch 2022-CN TCP. |work=IBM Globalization: Coded character set identifiers}}</ref> ===GBK and GB 18030=== {{Main|GBK (character encoding)|GB 18030}} [[GBK (character encoding)|GBK]] is an extension to {{nowrap|GB 2312}}. It defines an extended form of the EUC-CN encoding capable of representing a larger array of [[CJK characters]] sourced largely from {{nowrap|[[Unicode]] 1.1}}, including [[traditional Chinese characters]] and characters used only in [[Japanese language|Japanese]]. It is not, however, a true EUC code, because ASCII bytes may appear as trail bytes (and [[C0 and C1 control codes#C1|C1 bytes]], not limited to the single shifts, may appear as lead or trail bytes), due to a larger encoding space being required. Variants of GBK are implemented by [[Code page 936 (Microsoft Windows)|Windows code page 936]] (the [[Microsoft Windows]] [[Windows code page|code page]] for simplified Chinese), and by IBM's code page 1386. The Unicode-based {{nowrap|[[GB 18030]]}} character encoding defines an extension of GBK capable of encoding the entirety of [[Unicode]]. However, Unicode encoded as {{nowrap|GB 18030}} is a [[variable-width encoding|variable-length encoding]] which may use up to four bytes per character, due to an even larger encoding space being required. Being an extension of GBK, it is a superset of EUC-CN but is not itself a true EUC code. Being a Unicode encoding, its repertoire is identical to that of other [[Unicode transformation format]]s such as [[UTF-8]]. ==={{anchor|MacChineseSimp|x-mac-chinesesimp}}Mac OS Chinese Simplified=== Other EUC-CN variants deviating from the EUC mechanism include the [[classic Mac OS]] Chinese Simplified script (known as Code page 10008 or <code>x-mac-chinesesimp</code>).<ref name="msdnlabels">{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property β .NET Framework (current version) |work=MSDN |publisher=Microsoft}}</ref> It uses the bytes 0x80, 0x81, 0x82, 0xA0, 0xFD, 0xFE, and 0xFF for the [[ΓΌ|U with umlaut]] (ΓΌ), two special font metric characters, the [[non-breaking space]], the [[copyright sign]] (Β©), the [[trademark sign]] (β’) and the ellipsis (...) respectively.<ref name="macsimchinese">{{cite web|url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/CHINSIMP.TXT|title=Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later.|publisher=[[Apple, Inc]]}}<!-- Note: The comment blurb at the start of the file gives 0xFC and 0xFD as the Β© and β’ one-byte codes, contradicting its (the blurb's) statement that 0xA1β0xFC are the double-byte lead bytes. The actual (more authoritative) mapping data in the file lists 0xFD and 0xFE as the Β© and β’ one-byte codes. --></ref> This differs in what is regarded as a single-byte character versus the first byte of a two-byte character from both EUC (where, of those, 0xFD and 0xFE are defined as lead bytes) and GBK (where, of those, 0x81, 0x82, 0xFD and 0xFE are defined as lead bytes). This use of 0xA0, 0xFD, 0xFE and 0xFF matches [[MacJapanese|Apple's Shift_JIS variant]]. Besides these changes to the lead byte range, the other distinctive feature of the double-byte portion of Mac OS Chinese Simplified is the inclusion of two extensions to the basic GB 2312-80 set in rows 6 and 8.<ref name="macsimchinese" /> These are considered "standard extensions to GB 2312", neither of which is proprietary to Apple: the row 8 extension was taken from [[GB 6345.1]],<ref name="macsimchinese" /> both extensions are included by [[GB/T 12345]] (the traditional Chinese variant of GB 2312),<ref>{{cite book |title=Appendix F: GB/T 12345 |last=Lunde |first=Ken |author-link=Ken Lunde |chapter=CJKV Information Processing |isbn=9781565922242 |year=1998 |url=https://resources.oreilly.com/examples/9781565922242/blob/master/AppF/gbt12345.pdf |publisher=[[O'Reilly Media]]}}</ref> and both extensions are included by [[GB 18030]] (the successor to GB 2312).<ref>{{Cite book|url=https://archive.org/details/GB18030-2005|title=GB 18030-2005: Information TechnologyβChinese coded character set|last=Standardization Administration of China (SAC)|date=2005-11-18}}</ref> ==EUC-JP== {{Infobox character encoding |name=EUC-JP |alias=Unixized JIS (UJIS), csEUCPkdFmtJapanese |mime=EUC-JP |image=EUC-JP.svg |caption= |standard= |extends=[[ASCII]] or [[JISCII|ISO 646:JP]] |encodes=[[JIS X 0208]], [[JIS X 0212]], [[JIS X 0201]] |lang=[[Japanese language|Japanese]], [[English language|English]], [[Russian language|Russian]] |next=EUC-JISx0213 |classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} {{Infobox character encoding |name=EUC-JIS-2004 |alias=EUC-JISx0213 |_nomimecode=1 <!-- Was it ever registered for MIME (as opposed to ICONV) though? |mime=<code>EUC-JIS-2004</code> (2004)<br/><code>EUC-JISx0213</code> (2000) --> |image=EUC-JISx0213.svg |caption= |standard=JIS X 0213 |extends=[[ASCII]] |encodes=[[JIS X 0213]], [[JIS X 0201]] (Kana) |lang=[[Japanese language|Japanese]], [[Ainu language|Ainu]], [[English language|English]], [[Russian language|Russian]] |prev=EUC-JP |classification = [[Extended ASCII]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-JP''' is a [[variable-width encoding|variable-length encoding]] used to represent the elements of three [[JIS encoding|Japanese character set standards]], namely {{nowrap|[[JIS X 0208]]}}, {{nowrap|[[JIS X 0212]]}}, and {{nowrap|[[JIS X 0201]]}}. Other names for this encoding include '''Unixized JIS''' (or '''UJIS''') and '''AT&T JIS'''.<ref name="lunde">{{cite book | url=https://books.google.com/books?id=EH1MDAAAQBAJ&q=%22euc+packed+format%22&pg=PA244 | title=CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing | publisher=O'Reilly | author=Lunde, Ken | year=2008 | isbn=9780596800925 | pages=242β244}}</ref> 0.1% of all web pages use EUC-JP since September 2022,<ref>{{cite web | url=https://w3techs.com/technologies/history_overview/character_encoding | title=Historical trends in the usage of character encodings for websites | publisher=W3Techs}}</ref> while 2.6% of websites written with Japanese use this second-most popular (for Japanese) encoding<ref>{{Cite web |title=Distribution of Character Encodings among websites that use Japanese |url=https://w3techs.com/technologies/segmentation/cl-ja-/character_encoding |access-date=2023-11-01 |website=w3techs.com}}</ref> (<!-- i.e. of those "that use Japanese as content language." -->which is more than for [[Shift JIS]]<!-- might be a fluke, but now by either metric: , though of those sites, i.e. on national Japanese websites,<!- i.e. with ".jp as top level domain."; it used to be "less used", note https://w3techs.com/technologies/details/en-shiftjis ShiftJIS recently took a nose-dive (a statistical fluke?), -> less used than {{nowrap|[[Shift JIS]]}}, --> both are much less used that [[UTF-8]]). It is called '''Code page 954''' by IBM.<ref>{{cite web|title=CCSID 954 information document|archive-url=https://web.archive.org/web/20160327022203/http://www-01.ibm.com/software/globalization/ccsid/ccsid954.html|archive-date=2016-03-27|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid954.html}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-954_P101-2007.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-954_P101-2007.ucm|date=2002-12-03}}</ref> Microsoft has two code page numbers for this encoding (51932 and 20932). This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by [[ISO-2022-JP]], which is based on the same character set standards, and without ASCII bytes appearing as trail bytes (unlike [[Shift JIS]]). A related and partially compatible encoding, called '''EUC-JISx0213''' or '''EUC-JIS-2004''', encodes {{nowrap|[[JIS X 0201]]}} and {{nowrap|[[JIS X 0213]]}}<ref name="x0213org">{{cite web | url=https://x0213.org/codetable/index.en.html | title=JIS X 0213 Code Mapping Tables | publisher=x0213.org}}</ref> (similarly to {{nowrap|[[Shift JIS-2004|Shift_JISx0213]]}}, its Shift_JIS-based counterpart). Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used {{nowrap|Shift JIS}} or its extensions ([[Code page 932 (Microsoft Windows)|Windows code page 932]] on [[Microsoft Windows]], and [[MacJapanese]] on [[classic Mac OS]]), although it became heavily used by [[Unix]] or Unix-like [[operating system]]s (except for [[HP-UX]]). Therefore, whether Japanese websites use EUC-JP or Shift_JIS often depends on what OS the author uses. Characters are encoded as follows: * As an EUC/[[ISO 2022]] compliant encoding, the [[C0 and C1 control codes#C0|C0 control characters]], space, and DEL are represented as in ASCII. * A graphical character from [[ASCII]] (code set 0) is represented as its usual one-byte representation, in the range 0x21 – 0x7E. While some variants of EUC-JP encode the [[Code page 895|lower half]] of {{nowrap|JIS X 0201}} here, most encode ASCII,<ref>{{cite web | url=https://www.w3.org/TR/japanese-xml/#AEN29832832 | title=Ambiguities in conversion from Japanese EUC to Unicode (Non-Normative) | publisher=W3C | work=XML Japanese Profile}}</ref> including the W3C/WHATWG Encoding standard used by [[HTML5]],<ref>{{cite web | url=https://encoding.spec.whatwg.org/#euc-jp-decoder | title=EUC-JP decoder | publisher=WHATWG | work=Encoding Standard}} "If the byte is an ASCII byte, return a code point whose value is a byte."</ref> and so does EUC-JIS-2004.<ref name="x0213org" /> While this means that 0x5C is typically mapped to Unicode as U+005C REVERSE SOLIDUS (the ASCII [[backslash]]), U+005C may be displayed as a [[Yen sign]] by certain Japanese-locale fonts, e.g. on Microsoft Windows, for compatibility with the lower half of {{nowrap|JIS X 0201}}.<ref>{{cite web | url=https://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | title=3.1.1 Details of Problems | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref>{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=https://archives.miloush.net/michkap/archive/2005/09/17/469941.html}}</ref> * A character from JIS X 0208 (code set 1) is represented by two bytes, both in the range 0xA1 – 0xFE. This differs from the ISO-2022-JP representation by having the high bit set. This code set may also contain vendor extensions in some EUC-JP variants. In EUC-JIS-2004, the first plane of {{nowrap|JIS X 0213}} is encoded here, which is effectively a superset of standard {{nowrap|JIS X 0208}}.<ref name="x0213org" /> * A character from the ''upper half'' of {{nowrap|JIS X 0201}} ([[half-width kana]], code set 2) is represented by two bytes, the first being 0x8E, the second being the usual {{nowrap|JIS X 0201}} representation in the range 0xA1 – 0xDF. This set may contain [[JIS X 0201#IBM's implementations|IBM vendor extensions]] in some variants. * A character from JIS X 0212 (code set 3) is represented in EUC-JP by three bytes, the first being 0x8F, the following two being in the range 0xA1–0xFE, i.e. with the high bit set. In addition to standard {{nowrap|JIS X 0212}}, code set 3 of some EUC-JP variants may also contain extensions in rows 83 and 84 to represent characters from IBM's Shift JIS extensions which lack standard JIS X 0212 mappings, which may be coded in either of two layouts, one defined by IBM themselves and one defined by the [[Open Software Foundation|OSF]].<ref name="osfibmextensions">{{cite web | url=https://www.opengroup.or.jp:80/jvc/cde/ucs-conv-e.html#ch4_2 | title=4.2 Review Process of Rules for Code Set Conversion Between eucJP-open and UCS | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch4_2 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref name="lundeJ">{{citation|mode=cs1 |title=Appendix J: Japanese Character Sets |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appJ.pdf}}</ref> In EUC-JIS-2004, the second plane of {{nowrap|JIS X 0213}} is encoded here,<ref name="x0213org" /> which does not collide with the allocated rows in standard {{nowrap|JIS X 0212}}.<ref name="hyeshik">{{cite web | url=https://github.com/python/cpython/blob/4b96c1384e008218bdfeb9e271a094b1ab8484d3/Modules/cjkcodecs/README | title=Readme for CJKCodecs | publisher=Python Software Foundation | work=cPython | last=Chang | first=Hyeshik| date=8 December 2021 }}</ref> Some implementations of EUC-JIS-2004, such as the one used by [[Python (programming language)|Python]], allow both {{nowrap|JIS X 0212}} and {{nowrap|JIS X 0213}} plane 2 characters in this set.<ref name="hyeshik" /> Vendor extensions to EUC-JP (from, for example, the [[Open Software Foundation]], [[IBM]] or [[NEC]]) were often allocated within the individual code sets,<ref name="osfibmextensions" /><ref name="lundeJ" /> as opposed to using invalid EUC sequences (as in popular extensions of EUC-CN and EUC-KR). However, some vendor-specific encodings are partially compatible with EUC-JP, due to encoding {{nobr|JIS X 0208}} over GR, but do not follow the packed EUC structure. Often, these do not include use of the single shifts from EUC-JP, and are thus not straight extensions of EUC-JP, with the exception of Super DEC Kanji. ===DEC Kanji=== [[Digital Equipment Corporation]] defines two variants of EUC-JP only partly conforming to the EUC packed format, but also bearing some resemblance to the complete two-byte format. The overall format of the "DEC Kanji" encoding mostly corresponds to fixed-length (complete two-byte) EUC; however, code set 0 is not required to be left-padded with null bytes (similarly to the packed format).<ref name="lundeF" /> JIS X 0208 is, as usual, used for code set 1; code set 2 (half-width katakana) is absent; code set 3 is encoded like the two-byte fixed width format (i.e. without a shift byte and with only the first high bit set), but used for two-byte user defined characters rather than being specified for JIS X 0212.<ref name="lundeF" /> In the basic "DEC Kanji" encoding, only the first 31 rows of code set 3 are used for user-defined characters: rows 32 through 94 are reserved, similarly to the unused rows in code set 1.<ref name="lunde2009appE" /> The "Super DEC Kanji" encoding accepts codes both from the "DEC Kanji" encoding and from packed-format EUC, for a total of five code-sets.<ref name="lundeF">{{citation|mode=cs1 |title=Appendix F: Vendor Encoding Methods |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appF.pdf}}</ref> It also allows the entire user defined code set, and the unused rows at the ends of the JIS X 0208 and JIS X 0212 code sets (rows 85β94 and 78β94 respectively), to be used for user-defined characters.<ref name="lunde2009appE" /> ===HP-16=== [[Hewlett-Packard]] defines an encoding referred to as "HP-16". This accompanies their "HP-15" encoding, which is a variant of [[Shift JIS]]. HP-16 encodes {{nobr|JIS X 0208}} using the same bytes as in EUC-JP, but does not use the single shift codes (thus omitting code sets 2 and 3), and adds three user-defined regions which do not follow the packed-format EUC structure:<ref name="lundeF" /> * Lead bytes 0xA1βC2, trail bytes 0x21β7E * Lead bytes 0xC3βE3, trail bytes 0x21β3F * Lead bytes 0xC3βE1, trail bytes 0x40β64 ===IKIS=== The IKIS (Interactive Kanji Information System) encoding used by [[Data General]] resembles EUC-JP without single shifts, i.e. with only code sets 0 and 1. Half-width katakana are instead included in row 8 of JIS X 0208 (colliding with the box-drawing characters added to the standard in 1983). JIS X 0208 rows 9 through 12 are used for user-defined characters.<ref name="lundeF" /><ref name="lunde2009appE" /> ===Adaptations of EUC-JP for EBCDIC=== {{Main article|Japanese language in EBCDIC}} KEIS (Kanji-processing Extended Information System) is an [[EBCDIC]] encoding used by [[Hitachi]],<ref name="lunde2009appE" /> with double-byte characters (a DBCS-Host encoding) included using shifting sequences, making it a [[state (computer science)|stateful]] encoding. Specifically, the sequence {{code|0x0A 0x41}} switches to single-byte mode and the sequence {{code|0x0A 0x42}} switches to double-byte mode.{{efn|These sequences match the hexadecimal forms shown by DEC{{refn|name=decunix}} and the decimal forms ({{code|10 65}} and {{code|10 66}}) listed by Lunde.{{refn|name=lundeF}} Lunde lists the hexadecimal forms for both as {{code|0xA0 0x42}}, seemingly in error.}} However, JIS X 0208 characters are encoded using the same byte sequences used to encode them in EUC-JP. This results in duplicate encodings for the {{ctrl|IDSP|ideographic space}}β0x4040 per the DBCS-Host code structure, and 0xA1A1 as in EUC-JP. This differs from IBM's DBCS-Host encoding for Japanese, the layout of which builds on versions which predate JIS X 0208 altogether. The lead byte range is extended back to 0x59, out of which the lead bytes 0x81βA0 are designated for user-defined characters,<ref name="lundeF" /> and the remainder are used for corporate-defined characters, including both kanji and non-kanji.<ref name="lunde2009appE" /> JEF (Japanese-processing Extended Feature)<ref name="lunde2009appE" /> is an EBCDIC encoding used on [[Fujitsu]] FACOM mainframes, contrasting with FMR (a variant of Shift JIS) used on Fujitsu PCs. Like KEIS, JEF is a stateful encoding, switching to a double-byte DBCS-Host mode using shifting sequences (where {{code|0x29}} switches to single-byte mode and {{code|0x28}} switches to double-byte mode).<ref name="decunix">{{cite web |url=https://www.itec.suny.edu/scsys/unix/doc/V4.0F/docs/html/SUPPDOCS/JAPANDOC/JAPANCH2.HTM |title=2: Codesets and Codeset Conversion |work=DIGITAL UNIX Technical Reference for Using Japanese Features |publisher=[[Digital Equipment Corporation]], [[Compaq]] }}{{dead link|date=November 2023}}</ref> Also similarly to KEIS, {{nowrap|JIS X 0208}} codes are represented the same as in EUC-JP.<ref name="lundeF" /> The lead byte range is extended back to 0x41, with 0x80β0xA0 designated for user definition; lead bytes 0x41β0x7F are assigned row numbers 101 through 163 for [[kuten]] purposes, although row 162 (lead byte 0x7E) is unused.<ref name="lundeF" /><ref name="lunde2009appE" /> Rows 101 through 148 are used for extended kanji, while rows 149 through 163 are used for extended non-kanji.<ref name="lunde2009appE" /> ==EUC-KR== {{Redirect|EUC-KR|the variant so named in HTML standards|Unified Hangul Code}} {{Infobox character encoding |name=EUC-KR |alias=Wansung, IBM-970 |mime=EUC-KR |image=EUC-KR_without_extensions.svg |caption=EUC-KR code structure |standard=KS X 2901 (KS C 5861) |lang=[[Korean language|Korean]], [[English language|English]], [[Russian language|Russian]] |encodes=[[KS X 1001]] |extends=[[ASCII]] or [[ISO 646|ISO 646:KR]] |extensions=[[MacKorean|Mac OS Korean]], [[Code page 949 (IBM)|IBM-949]], [[Unified Hangul Code|Unified Hangul Code (Windows-949)]] |next=[[Unified Hangul Code]] (web standards) |classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-KR''' is a [[variable-width encoding|variable-length encoding]] to represent Korean text using two coded character sets, {{nowrap|[[KS X 1001]]}} (formerly KS C 5601)<ref>{{cite web |url=https://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf |title=KS X 1001:1992}}</ref><ref>{{cite iso-ir |number=149 |title=KS C 5601:1987 |sponsor=Korea Bureau of Standards |date=1988-10-01}}</ref> and either {{nowrap|[[ISO 646]]:KR}} ({{nowrap|KS X 1003}}, formerly {{nowrap|KS C 5636}}) or [[ASCII]], depending on variant. {{nowrap|[[KS X 2901]]}} (formerly {{nowrap|KS C 5861}}) stipulates the encoding and {{IETF RFC|1557}} dubbed it as EUC-KR. A character drawn from KS X 1001 (G1, code set 1) is encoded as two bytes in GR (0xA1β0xFE) and a character from {{nowrap|KS X 1003}} or ASCII (G0, code set 0) takes one byte in GL (0x21β0x7E). It is usually referred to as Wansung ({{korean|μμ±|rr=Wanseong|lit=precomposed<ref>{{cite book|chapter-url=https://books.google.com/books?id=SA92uQqTB-AC&pg=PA146|title=CJKV Information Processing|last=Lunde|first=Ken|author-link=Ken Lunde|page=146|chapter=Chapter 3: Character Set Standards|isbn=978-0596514471|date=2009|publisher="O'Reilly Media, Inc." }}</ref>}}) in the [[Republic of Korea]]. IBM refers to the double-byte component as '''Code page 971''',<ref>{{Cite web|title=IBM Globalization β Coded character set identifiers β CCSID 971|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|archive-url=https://web.archive.org/web/20141130005339/http://www-01.ibm.com/software/globalization/ccsid/ccsid971.html|access-date=2021-09-03|archive-date=2014-11-30}}</ref> and to EUC-KR with ASCII as '''Code page 970'''.<ref>{{cite web|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|title=CCSID 970|publisher=IBM|work=IBM Globalization|archive-url=https://web.archive.org/web/20141201233141/http://www-01.ibm.com/software/globalization/ccsid/ccsid970.html|archive-date=2014-12-01}}</ref><ref>{{cite web|url=https://icu4c-demos.unicode.org/icu-bin/convexp?conv=euc-kr|title=ibm-970_P110_P110-2006_U2 (alias euc-kr)|work=Converter Explorer β ICU Demonstration|publisher=International Components for Unicode}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-970_P110_P110-2006_U2.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-970_P110_P110-2006_U2.ucm|date=2002-12-03}}</ref> It is implemented as '''Code page 20949''' ("Korean Wansung")<ref name="winids" /><ref>{{cite web |url=https://source.winehq.org/git/wine.git/blob/6f68543692a7588daa581d00c475715395036b15:/tools/make_unicode#l946 |title=dump_krwansung_codepage: build Korean Wansung table from the KSX1001 file |work=make_unicode: Generate code page .c files from ftp.unicode.org descriptions |first=Alexandre |last=Julliard |date=11 March 2021 |publisher=[[Wine (software)|Wine Project]]}}</ref> and '''Code page 51949''' ("EUC Korean") by Microsoft.<ref name="winids">{{cite web |url=https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |title=Code Page Identifiers |publisher=Microsoft |department=Windows Dev Center|date=7 January 2021 }}</ref> {{As of|2025|05}}, less than 0.065% of all web pages globally declare using EUC-KR,<ref>{{Cite web |title=Usage Statistics and Market Share of EUC-KR for Websites, March 2025 |url=https://w3techs.com/technologies/details/en-euckr |access-date=2025-05-02 |website=w3techs.com}}</ref> but <!-- At 7.4% but rather showing result of calculation, that number plus other encoding adds up to way more than 100.0%. [[KS C 5601]] seems to be related, and is factored into the calculation: 100-95.5% = --> 4.5% of South Korean web pages use EUC-KR.<ref>{{Cite web|title=Distribution of Character Encodings among websites that use .kr|url=https://w3techs.com/technologies/segmentation/tld-kr-/character_encoding|website=w3techs.com|access-date=2025-05-02}}</ref> <!-- Cyrillic Windows-1251 in Russia at 100-96.1% = 3.9% (4.5%) is often higher, currently lower, and some small languages have higher non-UTF-8 use: https://w3techs.com/technologies/segmentation/cl-br-/character_encoding meaning the following is an untrue statement: making it the most popular non-[[UTF-8]]/Unicode encoding for a language/web domain.<ref>{{Cite web|url=https://w3techs.com/technologies/segmentation/cl-ko-/character_encoding|title=Distribution of Character Encodings among websites that use Korean|website=w3techs.com|access-date=2022-06-18}}</ref> --> Including extensions, it is the most widely used legacy character encoding in Korea on all three major platforms ([[macOS]], other Unix-like OSes, and Windows), but its use has been very slowly shifting to [[UTF-8]] as it gains popularity, especially on Linux and macOS. As with most other encodings, [[UTF-8]] is now preferred for new use, solving problems with consistency between platforms and vendors. ===Unified Hangul Code=== {{Main|Unified Hangul Code}} A common extension of EUC-KR is the [[Unified Hangul Code]] ({{korean|ν΅ν©ν νκΈ μ½λ|rr=Tonghabhyeong Hangeul Kodeu|labels=no}},<ref>{{cite web|url=https://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|title=νκΈ μ½λμ λνμ¬|publisher=W3C|language=ko|access-date=2019-01-07|archive-url=https://web.archive.org/web/20130524175322/http://www.w3c.or.kr/i18n/hangul-i18n/ko-code.html|archive-date=2013-05-24|url-status=dead}}</ref> or {{korean|ν΅ν© μμ±ν|rr=Tonghab Wansunghyung|labels=no}}), which is the default Korean codepage on Microsoft Windows. It is given the code page number 949 by Microsoft, and 1261<ref>In [https://opensource.apple.com/source/ICU/ICU-59180.0.1/icuSources/common/ucnv_lmb.cpp.auto.html ucnv_lmb.cpp], a file originating from [[IBM]] and included in the [[International Components for Unicode]] source tree, the lead byte 0x11 is commented as referring to "Korean: ibm-1261" after the definition of <code>ULMBCS_GRP_KO</code>, and is mapped to the <code>"windows-949"</code> ICU codec in the <code>OptGroupByteToCPName</code> array later in the file.</ref> or 1363<ref>{{citation|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|publisher=IBM|title=Coded character set identifiers β CCSID 1363|work=IBM Globalization|archive-url=https://web.archive.org/web/20141129210404/http://www-01.ibm.com/software/globalization/ccsid/ccsid1363.html|archive-date=2014-11-29|url-status=dead}}</ref> by IBM. [[Code page 949 (IBM)|IBM's code page 949]] is a different, unrelated, EUC-KR extension. Unified Hangul Code extends EUC-KR by using codes that do not conform to the EUC structure to incorporate additional syllable blocks, completing the coverage of the composed syllable blocks available in [[Johab]] and Unicode. The [[W3C]]/[[WHATWG]] Encoding Standard used by [[HTML5]] incorporates the Unified Hangul Code extensions into its definition of EUC-KR.<ref>{{citation|url=https://encoding.spec.whatwg.org/#index-euc-kr|title=5. Indexes (Β§ index EUC-KR)|work=Encoding Standard|publisher=WHATWG}}</ref> ===Mac OS Korean (HangulTalk)=== Other encodings incorporating EUC-KR as a subset include the Mac OS Korean script (known as Code page 10003 or <code>x-mac-korean</code>),<ref name="msdnlabels"/> which was used by HangulTalk (MacOS-KH), the Korean localization of the [[classic Mac OS]]. It was developed by Elex Computer ({{lang|ko|μΌλ μ€}}), who were at the time the authorised distributor of Apple Macintosh computers in South Korea.<ref>{{cite web |url=http://hojin.freeservers.com/beige/hom/11HangulTalk.html |title=HangulTalk: De facto standard Hangul environment for Mac |work=Guide to using Hangul on Macintosh |last=Gil |first=Hojin}}</ref><ref name="lunde2009appE"/> HangulTalk adds extension characters with lead bytes between 0xA1 and 0xAD, both in unused space within the EUC-KR GR plane (trail bytes 0xA1–0xFE), and using non-EUC codes outside of it (trail bytes 0x41–0xA0). Some of these characters are font-style-independent stylized [[dingbat]]s.<ref name="lunde2009appE">{{citation|mode=cs1 |title=Appendix E: Vendor Character Set Standards |work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing |last=Lunde |first=Ken |author-link=Ken Lunde |year=2009 |edition=2nd |publisher=[[O'Reilly Media|O'Reilly]] |location=[[Sebastopol, CA]] |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> Many of these characters do not have exact Unicode mappings, and Apple software maps these cases variously to [[combining character|combining sequences]], to approximate mappings with an appended [[Private Use Area|private-use]] character as a modifier for round-trip purposes, or to private-use characters.<ref name="mackoreantxt">{{cite web |url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT |author=Apple |author-link=Apple, Inc |title=Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later |date=2005-04-05 |publisher=[[Unicode Consortium]]}}</ref> Apple also uses certain single-byte codes outside of the EUC-KR plane for additional characters: 0x80 for a [[required space]], 0x81 for a [[won sign]] (β©), 0x82 for an [[en dash]] (–), 0x83 for a [[copyright sign]] ({{not a typo|Β©}}), 0x84 for a wide [[underscore]] ({{not a typo|οΌΏ}}) and 0xFF for an [[ellipsis]] (...).<ref name="mackoreantxt" /> Although none of these additional single-byte codes are within the lead byte range of plain EUC-KR (unlike Apple's extensions to EUC-CN, [[#x-mac-chinesesimp|see above]]), some are within the lead byte range of Unified Hangul Code (specifically, 0x81, 0x82, 0x83 and 0x84). ==EUC-KP== {{Main|KPS 9566}} Similarly to KS X 1001, the North Korean [[KPS 9566]] standard is typically used in EUC form; in these contexts, it is sometimes referred to as EUC-KP.<ref>{{cite web |last=Kim |first=Kyongsok |date=2002-11-30 |title=3-way cross-reference tables β KS X 1001, KPS 9566, and UCS |url=https://unicode.org/wg2/docs/n2564.pdf |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2564}} [Note: updated links for tables accompanying document: [https://web.archive.org/web/20210727225816/http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/ks2kp_ucs-v09.txt] [https://web.archive.org/web/20210727214628/http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt]]</ref> More recent editions of the standard extend the EUC representation with characters using non-EUC two-byte codes, in a similar manner to Unified Hangul Code.<ref>{{cite web |last=Chung |first=Jaemin |url=https://www.unicode.org/L2/L2018/18011-info-kps9566-2011.pdf |id=[[Unicode Technical Committee|UTC]] L2/18-011 |title=Information on the most recent version of KPS 9566 (KPS 9566-2011?) |date=2018-01-05}}</ref> ==EUC-TH== Although certain single-byte encodings such as the [[ISO/IEC 8859]] series technically conform to the EUC structure, they are rarely labeled as EUC. However, {{code|eucTH}} is used on [[Oracle Solaris|Solaris]] as a label for [[TIS-620]].<ref>{{cite web |url=https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/solaris-eucTH-2.7.ucm |title=solaris-eucTH-2.7 |work=icu-data |author=IBM |author-link=IBM |publisher=[[Unicode Consortium]]/[[International Components for Unicode]] |date=2001-05-07}}</ref> ==EUC-TW== '''EUC-TW''' is a [[variable-width encoding|variable-length encoding]] that supports ASCII and 16 planes of {{nowrap|[[CNS 11643]]}}, each of which is 94Γ94. It is a rarely used encoding for [[traditional Chinese characters]] as used in [[Taiwan]]. Variants of [[Big5]] are much more common than EUC-TW, although Big5 only encodes the first two planes of CNS 11643 [[hanzi]], while [[UTF-8]] is becoming more common. * As an EUC/[[ISO 2022]] encoding, the [[C0 and C1 control codes#C0|C0 control characters]], ASCII space, and DEL are encoded as in ASCII. * A graphical character from ASCII (G0, code set 0) is encoded in GL as its usual single-byte representation (0x21β0x7E). * A character from CNS 11643 plane 1 (code set 1) is encoded as two bytes in GR (0xA1β0xFE). * A character in planes 1 through 16 of CNS 11643 (code set 2) is encoded as four bytes: ** The first byte is always 0x8E (Single Shift 2). ** The second byte (0xA1β0xB0) indicates the plane, the number of which is obtained by subtracting 0xA0 from that byte. ** The third and fourth bytes are in GR (0xA1β0xFE). Note that plane 1 of CNS 11643 is encoded twice as code set 1 and a part of code set 2. ==See also== * [[CJK characters]] * [[Japanese language and computers]] * [[Korean language and computers]] * [[Chinese character encoding]] ==Notes== {{Notelist}} ==References== {{reflist}} ==External links== * [http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml EUC-JP codeset table] (minus the ASCII and [[half-width kana|half-width]] parts) * [https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers Code Page Identifiers] * [https://web.archive.org/web/20120825155118/http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html GB18030-2000{{snd}} The New Chinese National Standard] (since updated to [[GB18030]]-2022, which is (slightly) incompatible) * [https://web.archive.org/web/20060329202847/http://www.jagat.or.jp/asia/report/China3.htm The New Generation of Pre-Press Software in China]{{snd}} mentions the 748 code * [https://web.archive.org/web/20050611013847/http://www.cns11643.gov.tw/web/word.jsp#euc Description of the EUC-TW code] (in Chinese) * [https://search.cpan.org/~dankogai/Encode-JIS2K-0.02/JIS2K.pm Manual page of EUC-JISX0213] in the Perl Encode module * [https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf International Register of Coded Character Sets to be Used With Escape Sequences]{{snd}} section 2.4 (p. 14f.) with the coded character sets of China, Japan, South Korea, North Korea and Taiwan (ISO/IEC) * [https://users.monash.edu/~jwb/cjk.inf Chinese, Japanese, and Korean character set standards and encoding systems] {{Character encoding}} [[Category:Character sets]] [[Category:Chinese-language computing]] [[Category:Encodings of Asian languages]] [[Category:Encodings of Japanese]] [[Category:Korean-language computing]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:As of
(
edit
)
Template:Character encoding
(
edit
)
Template:Citation
(
edit
)
Template:Cite book
(
edit
)
Template:Cite iso-ir
(
edit
)
Template:Cite web
(
edit
)
Template:Code
(
edit
)
Template:Ctrl
(
edit
)
Template:Dead link
(
edit
)
Template:Efn
(
edit
)
Template:IETF RFC
(
edit
)
Template:Infobox character encoding
(
edit
)
Template:Korean
(
edit
)
Template:Lang
(
edit
)
Template:Main
(
edit
)
Template:Main article
(
edit
)
Template:Nobr
(
edit
)
Template:Not a typo
(
edit
)
Template:Notelist
(
edit
)
Template:Nowrap
(
edit
)
Template:Redirect
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Snd
(
edit
)
Template:Technical
(
edit
)