Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extended Unix Code
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==EUC-JP== {{Infobox character encoding |name=EUC-JP |alias=Unixized JIS (UJIS), csEUCPkdFmtJapanese |mime=EUC-JP |image=EUC-JP.svg |caption= |standard= |extends=[[ASCII]] or [[JISCII|ISO 646:JP]] |encodes=[[JIS X 0208]], [[JIS X 0212]], [[JIS X 0201]] |lang=[[Japanese language|Japanese]], [[English language|English]], [[Russian language|Russian]] |next=EUC-JISx0213 |classification = [[Extended ASCII|Extended]] [[ISO 646]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} {{Infobox character encoding |name=EUC-JIS-2004 |alias=EUC-JISx0213 |_nomimecode=1 <!-- Was it ever registered for MIME (as opposed to ICONV) though? |mime=<code>EUC-JIS-2004</code> (2004)<br/><code>EUC-JISx0213</code> (2000) --> |image=EUC-JISx0213.svg |caption= |standard=JIS X 0213 |extends=[[ASCII]] |encodes=[[JIS X 0213]], [[JIS X 0201]] (Kana) |lang=[[Japanese language|Japanese]], [[Ainu language|Ainu]], [[English language|English]], [[Russian language|Russian]] |prev=EUC-JP |classification = [[Extended ASCII]], [[variable-width encoding|variable-length encoding]], [[CJK characters|CJK encoding]], EUC }} '''EUC-JP''' is a [[variable-width encoding|variable-length encoding]] used to represent the elements of three [[JIS encoding|Japanese character set standards]], namely {{nowrap|[[JIS X 0208]]}}, {{nowrap|[[JIS X 0212]]}}, and {{nowrap|[[JIS X 0201]]}}. Other names for this encoding include '''Unixized JIS''' (or '''UJIS''') and '''AT&T JIS'''.<ref name="lunde">{{cite book | url=https://books.google.com/books?id=EH1MDAAAQBAJ&q=%22euc+packed+format%22&pg=PA244 | title=CJKV Information Processing: Chinese, Japanese, Korean, and Vietnamese Computing | publisher=O'Reilly | author=Lunde, Ken | year=2008 | isbn=9780596800925 | pages=242β244}}</ref> 0.1% of all web pages use EUC-JP since September 2022,<ref>{{cite web | url=https://w3techs.com/technologies/history_overview/character_encoding | title=Historical trends in the usage of character encodings for websites | publisher=W3Techs}}</ref> while 2.6% of websites written with Japanese use this second-most popular (for Japanese) encoding<ref>{{Cite web |title=Distribution of Character Encodings among websites that use Japanese |url=https://w3techs.com/technologies/segmentation/cl-ja-/character_encoding |access-date=2023-11-01 |website=w3techs.com}}</ref> (<!-- i.e. of those "that use Japanese as content language." -->which is more than for [[Shift JIS]]<!-- might be a fluke, but now by either metric: , though of those sites, i.e. on national Japanese websites,<!- i.e. with ".jp as top level domain."; it used to be "less used", note https://w3techs.com/technologies/details/en-shiftjis ShiftJIS recently took a nose-dive (a statistical fluke?), -> less used than {{nowrap|[[Shift JIS]]}}, --> both are much less used that [[UTF-8]]). It is called '''Code page 954''' by IBM.<ref>{{cite web|title=CCSID 954 information document|archive-url=https://web.archive.org/web/20160327022203/http://www-01.ibm.com/software/globalization/ccsid/ccsid954.html|archive-date=2016-03-27|url=https://www-01.ibm.com/software/globalization/ccsid/ccsid954.html}}</ref><ref>{{Citation|title=International Components for Unicode (ICU), ibm-954_P101-2007.ucm|url=https://github.com/unicode-org/icu/blob/master/icu4c/source/data/mappings/ibm-954_P101-2007.ucm|date=2002-12-03}}</ref> Microsoft has two code page numbers for this encoding (51932 and 20932). This encoding scheme allows the easy mixing of 7-bit ASCII and 8-bit Japanese without the need for the escape characters employed by [[ISO-2022-JP]], which is based on the same character set standards, and without ASCII bytes appearing as trail bytes (unlike [[Shift JIS]]). A related and partially compatible encoding, called '''EUC-JISx0213''' or '''EUC-JIS-2004''', encodes {{nowrap|[[JIS X 0201]]}} and {{nowrap|[[JIS X 0213]]}}<ref name="x0213org">{{cite web | url=https://x0213.org/codetable/index.en.html | title=JIS X 0213 Code Mapping Tables | publisher=x0213.org}}</ref> (similarly to {{nowrap|[[Shift JIS-2004|Shift_JISx0213]]}}, its Shift_JIS-based counterpart). Compared to EUC-CN or EUC-KR, EUC-JP did not become as widely adopted on PC and Macintosh systems in Japan, which used {{nowrap|Shift JIS}} or its extensions ([[Code page 932 (Microsoft Windows)|Windows code page 932]] on [[Microsoft Windows]], and [[MacJapanese]] on [[classic Mac OS]]), although it became heavily used by [[Unix]] or Unix-like [[operating system]]s (except for [[HP-UX]]). Therefore, whether Japanese websites use EUC-JP or Shift_JIS often depends on what OS the author uses. Characters are encoded as follows: * As an EUC/[[ISO 2022]] compliant encoding, the [[C0 and C1 control codes#C0|C0 control characters]], space, and DEL are represented as in ASCII. * A graphical character from [[ASCII]] (code set 0) is represented as its usual one-byte representation, in the range 0x21 – 0x7E. While some variants of EUC-JP encode the [[Code page 895|lower half]] of {{nowrap|JIS X 0201}} here, most encode ASCII,<ref>{{cite web | url=https://www.w3.org/TR/japanese-xml/#AEN29832832 | title=Ambiguities in conversion from Japanese EUC to Unicode (Non-Normative) | publisher=W3C | work=XML Japanese Profile}}</ref> including the W3C/WHATWG Encoding standard used by [[HTML5]],<ref>{{cite web | url=https://encoding.spec.whatwg.org/#euc-jp-decoder | title=EUC-JP decoder | publisher=WHATWG | work=Encoding Standard}} "If the byte is an ASCII byte, return a code point whose value is a byte."</ref> and so does EUC-JIS-2004.<ref name="x0213org" /> While this means that 0x5C is typically mapped to Unicode as U+005C REVERSE SOLIDUS (the ASCII [[backslash]]), U+005C may be displayed as a [[Yen sign]] by certain Japanese-locale fonts, e.g. on Microsoft Windows, for compatibility with the lower half of {{nowrap|JIS X 0201}}.<ref>{{cite web | url=https://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | title=3.1.1 Details of Problems | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref>{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=https://archives.miloush.net/michkap/archive/2005/09/17/469941.html}}</ref> * A character from JIS X 0208 (code set 1) is represented by two bytes, both in the range 0xA1 – 0xFE. This differs from the ISO-2022-JP representation by having the high bit set. This code set may also contain vendor extensions in some EUC-JP variants. In EUC-JIS-2004, the first plane of {{nowrap|JIS X 0213}} is encoded here, which is effectively a superset of standard {{nowrap|JIS X 0208}}.<ref name="x0213org" /> * A character from the ''upper half'' of {{nowrap|JIS X 0201}} ([[half-width kana]], code set 2) is represented by two bytes, the first being 0x8E, the second being the usual {{nowrap|JIS X 0201}} representation in the range 0xA1 – 0xDF. This set may contain [[JIS X 0201#IBM's implementations|IBM vendor extensions]] in some variants. * A character from JIS X 0212 (code set 3) is represented in EUC-JP by three bytes, the first being 0x8F, the following two being in the range 0xA1–0xFE, i.e. with the high bit set. In addition to standard {{nowrap|JIS X 0212}}, code set 3 of some EUC-JP variants may also contain extensions in rows 83 and 84 to represent characters from IBM's Shift JIS extensions which lack standard JIS X 0212 mappings, which may be coded in either of two layouts, one defined by IBM themselves and one defined by the [[Open Software Foundation|OSF]].<ref name="osfibmextensions">{{cite web | url=https://www.opengroup.or.jp:80/jvc/cde/ucs-conv-e.html#ch4_2 | title=4.2 Review Process of Rules for Code Set Conversion Between eucJP-open and UCS | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch4_2 | archive-date=1999-02-03 | url-status=dead | access-date=2019-08-14 }}</ref><ref name="lundeJ">{{citation|mode=cs1 |title=Appendix J: Japanese Character Sets |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appJ.pdf}}</ref> In EUC-JIS-2004, the second plane of {{nowrap|JIS X 0213}} is encoded here,<ref name="x0213org" /> which does not collide with the allocated rows in standard {{nowrap|JIS X 0212}}.<ref name="hyeshik">{{cite web | url=https://github.com/python/cpython/blob/4b96c1384e008218bdfeb9e271a094b1ab8484d3/Modules/cjkcodecs/README | title=Readme for CJKCodecs | publisher=Python Software Foundation | work=cPython | last=Chang | first=Hyeshik| date=8 December 2021 }}</ref> Some implementations of EUC-JIS-2004, such as the one used by [[Python (programming language)|Python]], allow both {{nowrap|JIS X 0212}} and {{nowrap|JIS X 0213}} plane 2 characters in this set.<ref name="hyeshik" /> Vendor extensions to EUC-JP (from, for example, the [[Open Software Foundation]], [[IBM]] or [[NEC]]) were often allocated within the individual code sets,<ref name="osfibmextensions" /><ref name="lundeJ" /> as opposed to using invalid EUC sequences (as in popular extensions of EUC-CN and EUC-KR). However, some vendor-specific encodings are partially compatible with EUC-JP, due to encoding {{nobr|JIS X 0208}} over GR, but do not follow the packed EUC structure. Often, these do not include use of the single shifts from EUC-JP, and are thus not straight extensions of EUC-JP, with the exception of Super DEC Kanji. ===DEC Kanji=== [[Digital Equipment Corporation]] defines two variants of EUC-JP only partly conforming to the EUC packed format, but also bearing some resemblance to the complete two-byte format. The overall format of the "DEC Kanji" encoding mostly corresponds to fixed-length (complete two-byte) EUC; however, code set 0 is not required to be left-padded with null bytes (similarly to the packed format).<ref name="lundeF" /> JIS X 0208 is, as usual, used for code set 1; code set 2 (half-width katakana) is absent; code set 3 is encoded like the two-byte fixed width format (i.e. without a shift byte and with only the first high bit set), but used for two-byte user defined characters rather than being specified for JIS X 0212.<ref name="lundeF" /> In the basic "DEC Kanji" encoding, only the first 31 rows of code set 3 are used for user-defined characters: rows 32 through 94 are reserved, similarly to the unused rows in code set 1.<ref name="lunde2009appE" /> The "Super DEC Kanji" encoding accepts codes both from the "DEC Kanji" encoding and from packed-format EUC, for a total of five code-sets.<ref name="lundeF">{{citation|mode=cs1 |title=Appendix F: Vendor Encoding Methods |work=CJKV Information Processing |edition=2nd |last=Lunde |first=Ken |date=13 January 2009 |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appF.pdf}}</ref> It also allows the entire user defined code set, and the unused rows at the ends of the JIS X 0208 and JIS X 0212 code sets (rows 85β94 and 78β94 respectively), to be used for user-defined characters.<ref name="lunde2009appE" /> ===HP-16=== [[Hewlett-Packard]] defines an encoding referred to as "HP-16". This accompanies their "HP-15" encoding, which is a variant of [[Shift JIS]]. HP-16 encodes {{nobr|JIS X 0208}} using the same bytes as in EUC-JP, but does not use the single shift codes (thus omitting code sets 2 and 3), and adds three user-defined regions which do not follow the packed-format EUC structure:<ref name="lundeF" /> * Lead bytes 0xA1βC2, trail bytes 0x21β7E * Lead bytes 0xC3βE3, trail bytes 0x21β3F * Lead bytes 0xC3βE1, trail bytes 0x40β64 ===IKIS=== The IKIS (Interactive Kanji Information System) encoding used by [[Data General]] resembles EUC-JP without single shifts, i.e. with only code sets 0 and 1. Half-width katakana are instead included in row 8 of JIS X 0208 (colliding with the box-drawing characters added to the standard in 1983). JIS X 0208 rows 9 through 12 are used for user-defined characters.<ref name="lundeF" /><ref name="lunde2009appE" /> ===Adaptations of EUC-JP for EBCDIC=== {{Main article|Japanese language in EBCDIC}} KEIS (Kanji-processing Extended Information System) is an [[EBCDIC]] encoding used by [[Hitachi]],<ref name="lunde2009appE" /> with double-byte characters (a DBCS-Host encoding) included using shifting sequences, making it a [[state (computer science)|stateful]] encoding. Specifically, the sequence {{code|0x0A 0x41}} switches to single-byte mode and the sequence {{code|0x0A 0x42}} switches to double-byte mode.{{efn|These sequences match the hexadecimal forms shown by DEC{{refn|name=decunix}} and the decimal forms ({{code|10 65}} and {{code|10 66}}) listed by Lunde.{{refn|name=lundeF}} Lunde lists the hexadecimal forms for both as {{code|0xA0 0x42}}, seemingly in error.}} However, JIS X 0208 characters are encoded using the same byte sequences used to encode them in EUC-JP. This results in duplicate encodings for the {{ctrl|IDSP|ideographic space}}β0x4040 per the DBCS-Host code structure, and 0xA1A1 as in EUC-JP. This differs from IBM's DBCS-Host encoding for Japanese, the layout of which builds on versions which predate JIS X 0208 altogether. The lead byte range is extended back to 0x59, out of which the lead bytes 0x81βA0 are designated for user-defined characters,<ref name="lundeF" /> and the remainder are used for corporate-defined characters, including both kanji and non-kanji.<ref name="lunde2009appE" /> JEF (Japanese-processing Extended Feature)<ref name="lunde2009appE" /> is an EBCDIC encoding used on [[Fujitsu]] FACOM mainframes, contrasting with FMR (a variant of Shift JIS) used on Fujitsu PCs. Like KEIS, JEF is a stateful encoding, switching to a double-byte DBCS-Host mode using shifting sequences (where {{code|0x29}} switches to single-byte mode and {{code|0x28}} switches to double-byte mode).<ref name="decunix">{{cite web |url=https://www.itec.suny.edu/scsys/unix/doc/V4.0F/docs/html/SUPPDOCS/JAPANDOC/JAPANCH2.HTM |title=2: Codesets and Codeset Conversion |work=DIGITAL UNIX Technical Reference for Using Japanese Features |publisher=[[Digital Equipment Corporation]], [[Compaq]] }}{{dead link|date=November 2023}}</ref> Also similarly to KEIS, {{nowrap|JIS X 0208}} codes are represented the same as in EUC-JP.<ref name="lundeF" /> The lead byte range is extended back to 0x41, with 0x80β0xA0 designated for user definition; lead bytes 0x41β0x7F are assigned row numbers 101 through 163 for [[kuten]] purposes, although row 162 (lead byte 0x7E) is unused.<ref name="lundeF" /><ref name="lunde2009appE" /> Rows 101 through 148 are used for extended kanji, while rows 149 through 163 are used for extended non-kanji.<ref name="lunde2009appE" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)