Editing Extended Unix Code (section)

==EUC-JP==
{{Infobox character encoding
|name=EUC-JP
|alias=Unixized JIS (UJIS), csEUCPkdFmtJapanese
|mime=EUC-JP
|image=EUC-JP.svg
|caption=
|standard=
|extends=[[ASCII]] or [[JISCII|ISO 646:JP]]
|encodes=[[JIS X 0208]], [[JIS X 0212]], [[JIS X 0201]]
|lang=[[Japanese language|Japanese]], [[English language|English]], [[Russian language|Russian]]
|next=EUC-JISx0213
|classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC
}}
{{Infobox character encoding
|name=EUC-JIS-2004
|alias=EUC-JISx0213
|_nomimecode=1
<!-- Was it ever registered for MIME (as opposed to ICONV) though?
|mime=<code>EUC-JIS-2004</code> (2004)<br/><code>EUC-JISx0213</code> (2000)
-->
|image=EUC-JISx0213.svg
|caption=
|standard=JIS X 0213
|extends=[[ASCII]]
|encodes=[[JIS X 0213]], [[JIS X 0201]] (Kana)
|lang=[[Japanese language|Japanese]], [[Ainu language|Ainu]], [[English language|English]], [[Russian language|Russian]]
|prev=EUC-JP
|classification = [[Extended ASCII]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC
}}

'''EUC-JP''' is a [[variable-width encoding|variable-length encoding]] used to represent the elements of three [[JIS encoding|Japanese character set standards]], namely {{nowrap|[[JIS X 0208]]}}, {{nowrap|[[JIS X 0212]]}}, and {{nowrap|[[JIS X 0201]]}}. Other names for this encoding include '''Unixized JIS''' (or '''UJIS''') and '''AT&T JIS'''.<ref name="lunde">{{cite book | url=https://books.google.com/books?id=EH1MDAAAQBAJ&q=%22euc+packed+format%22&pg=PA244 | title=CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing | publisher=O'Reilly | author=Lunde, Ken | year=2008 | isbn=9780596800925 | pages=242–244}}</ref> 0.1% of all web pages use EUC-JP since September 2022,<ref>{{cite web | url=https://w3techs.com/technologies/history_overview/character_encoding | title=Historical trends in the usage of character encodings for websites | publisher=W3Techs}}</ref> while 2.6% of websites written with Japanese use this second-most popular (for Japanese) encoding<ref>{{Cite web |title=Distribution of Character Encodings among websites that use Japanese |url=https://w3techs.com/technologies/segmentation/cl-ja-/character_encoding |access-date=2023-11-01 |website=w3techs.com}}</ref> (<!-- i.e. of those "that use Japanese as content language." -->which is more than for [[Shift JIS]]<!-- might be a fluke, but now by either metric: , though of those sites, i.e. on national Japanese websites,<!- i.e. with ".jp as top level domain."; it used to be "less used", note https://w3techs.com/technologies/details/en-shiftjis  ShiftJIS recently took a nose-dive (a statistical fluke?), -> less used than {{nowrap|[[Shift JIS]]}}, --> both are much less used that [[UTF-8]]). It is called '''Code page 954''' by IBM.<ref>{{cite web|title=CCSID 954 information document|archive-url=https://web.archive.org/web/20160327022203/http://www-01.ibm.com/software/globalization/ccsid/ccsid954.html|archive-date=2016-03-27|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid954.html}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-954_P101-2007.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-954_P101-2007.ucm|date=2002-12-03}}</ref> Microsoft has two code page numbers for this encoding (51932 and 20932).

This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by [[ISO-2022-JP]], which is based on the same character set standards, and without ASCII bytes appearing as trail bytes (unlike [[Shift JIS]]).

A related and partially compatible encoding, called '''EUC-JISx0213''' or '''EUC-JIS-2004''', encodes {{nowrap|[[JIS X 0201]]}} and {{nowrap|[[JIS X 0213]]}}<ref name="x0213org">{{cite web | url=https://x0213.org/codetable/index.en.html | title=JIS X 0213 Code Mapping Tables | publisher=x0213.org}}</ref> (similarly to {{nowrap|[[Shift JIS-2004|Shift_JISx0213]]}}, its Shift_JIS-based counterpart).

Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used {{nowrap|Shift JIS}} or its extensions ([[Code page 932 (Microsoft Windows)|Windows code page 932]] on [[Microsoft Windows]], and [[MacJapanese]] on [[classic Mac OS]]), although it became heavily used by [[Unix]] or Unix-like [[operating system]]s (except for [[HP-UX]]). Therefore, whether Japanese websites use EUC-JP or Shift_JIS often depends on what OS the author uses.

Characters are encoded as follows:

* As an EUC/[[ISO 2022]] compliant encoding, the [[C0 and C1 control codes#C0|C0 control characters]], space, and DEL are represented as in ASCII.
* A graphical character from [[ASCII]] (code set 0) is represented as its usual one-byte representation, in the range 0x21 &ndash; 0x7E. While some variants of EUC-JP encode the [[Code page 895|lower half]] of {{nowrap|JIS X 0201}} here, most encode ASCII,<ref>{{cite web | url=https://www.w3.org/TR/japanese-xml/#AEN29832832 | title=Ambiguities in conversion from Japanese EUC to Unicode (Non-Normative) | publisher=W3C | work=XML Japanese Profile}}</ref> including the W3C/WHATWG Encoding standard used by [[HTML5]],<ref>{{cite web | url=https://encoding.spec.whatwg.org/#euc-jp-decoder | title=EUC-JP decoder | publisher=WHATWG | work=Encoding Standard}} "If the byte is an ASCII byte, return a code point whose value is a byte."</ref> and so does EUC-JIS-2004.<ref name="x0213org" /> While this means that 0x5C is typically mapped to Unicode as U+005C REVERSE SOLIDUS (the ASCII [[backslash]]), U+005C may be displayed as a [[Yen sign]] by certain Japanese-locale fonts, e.g. on Microsoft Windows, for compatibility with the lower half of {{nowrap|JIS X 0201}}.<ref>{{cite web | url=https://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | title=3.1.1 Details of Problems | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref>{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=https://archives.miloush.net/michkap/archive/2005/09/17/469941.html}}</ref>
* A character from JIS X 0208 (code set 1) is represented by two bytes, both in the range 0xA1 &ndash; 0xFE. This differs from the ISO-2022-JP representation by having the high bit set. This code set may also contain vendor extensions in some EUC-JP variants. In EUC-JIS-2004, the first plane of {{nowrap|JIS X 0213}} is encoded here, which is effectively a superset of standard {{nowrap|JIS X 0208}}.<ref name="x0213org" />
* A character from the ''upper half'' of {{nowrap|JIS X 0201}} ([[half-width kana]], code set 2) is represented by two bytes, the first being 0x8E, the second being the usual {{nowrap|JIS X 0201}} representation in the range 0xA1 &ndash; 0xDF. This set may contain [[JIS X 0201#IBM's implementations|IBM vendor extensions]] in some variants.
* A character from JIS X 0212 (code set 3) is represented in EUC-JP by three bytes, the first being 0x8F, the following two being in the range 0xA1&ndash;0xFE, i.e. with the high bit set. In addition to standard {{nowrap|JIS X 0212}}, code set 3 of some EUC-JP variants may also contain extensions in rows 83 and 84 to represent characters from IBM's Shift JIS extensions which lack standard JIS X 0212 mappings, which may be coded in either of two layouts, one defined by IBM themselves and one defined by the [[Open Software Foundation|OSF]].<ref name="osfibmextensions">{{cite web | url=https://www.opengroup.or.jp:80/jvc/cde/ucs-conv-e.html#ch4_2 | title=4.2 Review Process of Rules for Code Set Conversion Between eucJP-open and UCS | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch4_2 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref name="lundeJ">{{citation|mode=cs1 |title=Appendix J: Japanese Character Sets |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appJ.pdf}}</ref> In EUC-JIS-2004, the second plane of {{nowrap|JIS X 0213}} is encoded here,<ref name="x0213org" /> which does not collide with the allocated rows in standard {{nowrap|JIS X 0212}}.<ref name="hyeshik">{{cite web | url=https://github.com/python/cpython/blob/4b96c1384e008218bdfeb9e271a094b1ab8484d3/Modules/cjkcodecs/README | title=Readme for CJKCodecs | publisher=Python Software Foundation | work=cPython | last=Chang | first=Hyeshik| date=8 December 2021 }}</ref> Some implementations of EUC-JIS-2004, such as the one used by [[Python (programming language)|Python]], allow both {{nowrap|JIS X 0212}} and {{nowrap|JIS X 0213}} plane 2 characters in this set.<ref name="hyeshik" />

Vendor extensions to EUC-JP (from, for example, the [[Open Software Foundation]], [[IBM]] or [[NEC]]) were often allocated within the individual code sets,<ref name="osfibmextensions" /><ref name="lundeJ" /> as opposed to using invalid EUC sequences (as in popular extensions of EUC-CN and EUC-KR).

However, some vendor-specific encodings are partially compatible with EUC-JP, due to encoding {{nobr|JIS X 0208}} over GR, but do not follow the packed EUC structure. Often, these do not include use of the single shifts from EUC-JP, and are thus not straight extensions of EUC-JP, with the exception of Super DEC Kanji.

===DEC Kanji===
[[Digital Equipment Corporation]] defines two variants of EUC-JP only partly conforming to the EUC packed format, but also bearing some resemblance to the complete two-byte format. The overall format of the "DEC Kanji" encoding mostly corresponds to fixed-length (complete two-byte) EUC; however, code set 0 is not required to be left-padded with null bytes (similarly to the packed format).<ref name="lundeF" /> JIS X 0208 is, as usual, used for code set 1; code set 2 (half-width katakana) is absent; code set 3 is encoded like the two-byte fixed width format (i.e. without a shift byte and with only the first high bit set), but used for two-byte user defined characters rather than being specified for JIS X 0212.<ref name="lundeF" /> In the basic "DEC Kanji" encoding, only the first 31 rows of code set 3 are used for user-defined characters: rows 32 through 94 are reserved, similarly to the unused rows in code set 1.<ref name="lunde2009appE" />

The "Super DEC Kanji" encoding accepts codes both from the "DEC Kanji" encoding and from packed-format EUC, for a total of five code-sets.<ref name="lundeF">{{citation|mode=cs1 |title=Appendix F: Vendor Encoding Methods |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appF.pdf}}</ref> It also allows the entire user defined code set, and the unused rows at the ends of the JIS X 0208 and JIS X 0212 code sets (rows 85–94 and 78–94 respectively), to be used for user-defined characters.<ref name="lunde2009appE" />

===HP-16===
[[Hewlett-Packard]] defines an encoding referred to as "HP-16". This accompanies their "HP-15" encoding, which is a variant of [[Shift JIS]]. HP-16 encodes {{nobr|JIS X 0208}} using the same bytes as in EUC-JP, but does not use the single shift codes (thus omitting code sets 2 and 3), and adds three user-defined regions which do not follow the packed-format EUC structure:<ref name="lundeF" />

* Lead bytes 0xA1–C2, trail bytes 0x21–7E
* Lead bytes 0xC3–E3, trail bytes 0x21–3F
* Lead bytes 0xC3–E1, trail bytes 0x40–64

===IKIS===
The IKIS (Interactive Kanji Information System) encoding used by [[Data General]] resembles EUC-JP without single shifts, i.e. with only code sets 0 and 1. Half-width katakana are instead included in row 8 of JIS X 0208 (colliding with the box-drawing characters added to the standard in 1983). JIS X 0208 rows 9 through 12 are used for user-defined characters.<ref name="lundeF" /><ref name="lunde2009appE" />

===Adaptations of EUC-JP for EBCDIC===
{{Main article|Japanese language in EBCDIC}}

KEIS (Kanji-processing Extended Information System) is an [[EBCDIC]] encoding used by [[Hitachi]],<ref name="lunde2009appE" /> with double-byte characters (a DBCS-Host encoding) included using shifting sequences, making it a [[state (computer science)|stateful]] encoding. Specifically, the sequence {{code|0x0A 0x41}} switches to single-byte mode and the sequence {{code|0x0A 0x42}} switches to double-byte mode.{{efn|These sequences match the hexadecimal forms shown by DEC{{refn|name=decunix}} and the decimal forms ({{code|10 65}} and {{code|10 66}}) listed by Lunde.{{refn|name=lundeF}} Lunde lists the hexadecimal forms for both as {{code|0xA0 0x42}}, seemingly in error.}} However, JIS X 0208 characters are encoded using the same byte sequences used to encode them in EUC-JP. This results in duplicate encodings for the {{ctrl|IDSP|ideographic space}}—0x4040 per the DBCS-Host code structure, and 0xA1A1 as in EUC-JP. This differs from IBM's DBCS-Host encoding for Japanese, the layout of which builds on versions which predate JIS X 0208 altogether. The lead byte range is extended back to 0x59, out of which the lead bytes 0x81–A0 are designated for user-defined characters,<ref name="lundeF" /> and the remainder are used for corporate-defined characters, including both kanji and non-kanji.<ref name="lunde2009appE" />

JEF (Japanese-processing Extended Feature)<ref name="lunde2009appE" /> is an EBCDIC encoding used on [[Fujitsu]] FACOM mainframes, contrasting with FMR (a variant of Shift&nbsp;JIS) used on Fujitsu PCs. Like KEIS, JEF is a stateful encoding, switching to a double-byte DBCS-Host mode using shifting sequences (where {{code|0x29}} switches to single-byte mode and {{code|0x28}} switches to double-byte mode).<ref name="decunix">{{cite web |url=https://www.itec.suny.edu/scsys/unix/doc/V4.0F/docs/html/SUPPDOCS/JAPANDOC/JAPANCH2.HTM |title=2: Codesets and Codeset Conversion |work=DIGITAL UNIX Technical Reference for Using Japanese Features |publisher=[[Digital Equipment Corporation]], [[Compaq]] }}{{dead link|date=November 2023}}</ref> Also similarly to KEIS, {{nowrap|JIS X 0208}} codes are represented the same as in EUC-JP.<ref name="lundeF" /> The lead byte range is extended back to 0x41, with 0x80–0xA0 designated for user definition; lead bytes 0x41–0x7F are assigned row numbers 101 through 163 for [[kuten]] purposes, although row 162 (lead byte 0x7E) is unused.<ref name="lundeF" /><ref name="lunde2009appE" /> Rows 101 through 148 are used for extended kanji, while rows 149 through 163 are used for extended non-kanji.<ref name="lunde2009appE" />