Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Shift JIS
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Multiple versions == [[File:Euler diag for jp charsets.svg|thumb|[[Euler diagram]] comparing repertoires of [[JIS X 0208]], [[JIS X 0212]], [[JIS X 0213]], [[Windows-31J]], the Microsoft standard repertoire and [[Unicode]] ]] [[File:JIS and Shift-JIS variants.svg|thumb|Relationship between Shift_JIS variants on the PC and related encodings, including intersections and other subsets. Names given are descriptive.]] Many different versions of Shift JIS exist. There are two areas for expansion: Firstly, JIS X 0208 does not fill the whole 94×94 space encoded for it in Shift JIS, therefore there is room for more characters here—these are really extensions to JIS X 0208 rather than to Shift JIS itself. Secondly, Shift JIS has more encoding space than is needed for {{nowrap|JIS X 0201}} and {{nowrap|JIS X 0208}} (see [[#Shift JIS byte map|§ Shift JIS byte map]] below), and this space can and is used for yet more characters (as either single-byte or double-byte characters). === Windows-932 / Windows-31J === {{Main|Code page 932 (Microsoft Windows)}} The most popular extension is [[Windows-31J|Windows code page 932]] (a [[CCSID]] also used for [[Code page 932 (IBM)|IBM's extension to Shift JIS]]), which is registered with the [[Internet Assigned Numbers Authority|IANA]] as "Windows-31J",<ref name="iana31j">{{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets}}</ref> separately from Shift JIS. This was popularized by Microsoft, although Microsoft itself does not recognize the Windows-31J name and instead calls that variation "shift_jis".<ref name="msdnlabels">{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property – .NET Framework (current version) |work=MSDN |publisher=Microsoft}}</ref><ref>{{cite web |url=https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |title=Code Page Identifiers |publisher=Microsoft |work=Windows Dev Center|date=7 January 2021 }}</ref> IBM's code page 943 includes the same double-byte codes as Microsoft's code page 932, while IBM's code page 932 includes fewer extensions (excluding those which Microsoft incorporates from NEC), and retains the character order from the 1978 edition of JIS X 0208, rather than implementing the [[JIS X 0208#Second standard|character variant swaps]] from the 1983 standard.<ref name="ibm932v943">{{cite web | url=https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.nlsgdrf/ibm-943_ibm-932.htm | title=IBM-943 and IBM-932 | publisher=IBM | work=IBM Knowledge Center}}</ref> Windows-31J assigns 0x5C to U+005C REVERSE SOLIDUS (the [[backslash]]), and 0x7E to U+007E [[tilde|TILDE]], following [[US-ASCII]].<ref>{{cite web | url=https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT | title=CP932.TXT | publisher=Unicode Consortium}}</ref> However, most localised fonts on Windows display U+005C as a [[Yen sign]] for {{nowrap|JIS X 0201}} compatibility.<ref>{{cite web | url=http://www.opengroup.or.jp:80/jvc/cde/ucs-conv-e.html#ch3_1_1 | title=3.1.1 Details of Problems | publisher=The Open Group Japan | work=Problems and Solutions for Unicode and User/Vendor Defined Characters | archive-url=https://web.archive.org/web/19990203115405/http://www.opengroup.or.jp/jvc/cde/ucs-conv-e.html#ch3_1_1 | archive-date=1999-02-03 | url-status=dead }}</ref><ref name="kaplan">{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=http://archives.miloush.net/michkap/archive/2005/09/17/469941.html}}</ref> It includes several extensions, namely "[[JIS X 0208#0x2D|NEC special characters]] (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",<ref name="iana31j" /> in addition to setting some encoding space aside for [[Private Use Areas#Private-use characters in other character sets|end user definition]].<ref>{{cite web | url=http://archives.miloush.net/michkap/archive/2007/05/26/2901371.html | title=The PUA outside of Unicode | author=Kaplan, Michael S | work=Sorting it all out | date=2007-05-26}}</ref> Windows codepage 932 is the version used in the [[W3C]]/[[WHATWG]] encoding standard used by [[HTML5]], which includes the "formerly proprietary extensions from IBM and NEC" from Windows-31J in its table for JIS X 0208,<ref>{{cite web | url=https://encoding.spec.whatwg.org/#index-jis0208 | title=5. Indexes (§ Index jis0208) | publisher=WHATWG | work=Encoding Standard}}</ref> and also treats the label "shift_jis" interchangeably with "windows-31j" with the intent of being "compatible with deployed content".<ref>{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard}}</ref> === MacJapanese === The version of Shift-JIS originating from the [[classic Mac OS]] (known as <code>x-mac-japanese</code>, Code page 10001<ref name="msdnlabels"/> or MacJapanese) assigned the [[tilde]] to 0x7E (following [[US-ASCII]], not {{nowrap|JIS X 0201}} which assigns the [[macron (diacritic)|overline]] here), but the [[Yen sign]] to 0x5C (as in {{nowrap|JIS X 0201}} and standard {{nowrap|Shift JIS}}). It also extended {{nowrap|JIS X 0201}} by assigning the [[backslash]] to 0x80 (corresponding to 0x5C in US-ASCII), the [[non-breaking space]] to 0xA0, the [[copyright sign]] to 0xFD, the [[trademark symbol]] to 0xFE and the half-width [[horizontal ellipsis]] to 0xFF. It also added extended double byte characters; including 53 vertical presentation forms in the {{nowrap|Shift_JIS}} range 0xEB41–0xED96, at 84 JIS rows down from their canonical forms, and 260 special characters in the Shift_JIS range 0x8540–0x886D.<ref name="macjapanese">{{cite web | url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT | title=JAPANESE.TXT: Map (external version) from Mac OS Japanese encoding to Unicode 2.1 and later. | publisher=Apple Computer, Inc.; Unicode Consortium}}</ref> This variant was introduced in [[KanjiTalk]] version 7.<ref name="lundenec13">{{cite web |url=https://blogs.adobe.com/CCJKType/2019/03/era-name-ligature-history.html |title=A Brief History of Japan's Era Name Ligatures |last=Lunde |first=Ken |author-link=Ken Lunde |work=CJK Type Blog |date=2019-03-21 |publisher=[[Adobe Inc]]}}</ref> However, certain Mac OS typefaces used other variants. Sai Mincho and Chu Gothic use a "[[PostScript]]" variant of MacJapanese, which included additional vertical presentation forms and a different set of extended special characters, based on the [[JIS X 0208#0x2D|NEC special characters]], some of which were only available in the printer versions of the fonts.<ref name="macjapanese" /> Older versions of Maru Gothic and Hon Mincho from [[System 7.1]] encoded vertical presentation forms at 10 (not 84) JIS rows down from their canonical forms, and did not include the special character extensions, this was subsequently changed.<ref name="macjapanese" /><ref>{{cite web | url=https://developer.apple.com/documentation/coreservices/1399915-encoding_variants_for_macjapanes?language=objc | title=Encoding Variants for MacJapanese | publisher=Apple | work=Apple Developer Documentation}}</ref> The typical variant used with KanjiTalk version 6 placed the vertical presentation forms 10 rows down, and also used the NEC extension layout for row 13.<ref>{{citation|mode=cs1 |url=https://resources.oreilly.com/examples/9780596514471/raw/3298dd4775b8a775d22a4f8b97705d93ba817691/cjkvip2e-appE.pdf |title=Appendix E: Vendor Character Set Standards |date=2008 |last=Lunde |first=Ken |author-link=Ken Lunde |work=CJKV Information Processing |isbn=9780596514471 |publisher=[[O'Reilly Media]]}}</ref> === Shift_JISx0213 and Shift_JIS-2004 === <!-- [[Shift_JIS-2004]] redirects here --> {{infobox character encoding | name = Shift_JIS-2004 | mime = | alias = Shift_JISx0213 | standard = JIS X 0213 | lang = [[Japanese language|Japanese]], [[Ainu language|Ainu]], [[English language|English]], [[Russian language|Russian]] | status = | extends = Shift_JIS (1997),<br/>[[JIS X 0201]] (8-bit) | encodes = [[JIS X 0213]] | prev = Shift_JIS (1997) | next = }} The newer [[JIS X 0213]] standard defines an extended variant of Shift_JIS referred to as '''Shift_JISx0213''' (in a previous version of the standard) or '''Shift_JIS-2004'''. It is a superset of standard Shift JIS.<ref name="x0213org">{{cite web | url=http://x0213.org/codetable/index.en.html | title=JIS X 0213 Code Mapping Tables | publisher=x0213.org}}</ref> In order to represent the allocated rows on both planes of JIS X 0213, Shift_JIS-2004 uses the following method of mapping codepoints.<ref>{{cite web | url=http://www.asahi-net.or.jp/~wq6k-yn/code/enc-x0213.html#sjis-2004 | title=JIS X 0213の代表的な符号化方式 § Shift_JIS-2004 | language=ja}} Hexadecimal numbers in the source have been converted to decimal for display.</ref> :<math>s_1 = \begin{cases} \left \lfloor \frac{k + 257}{2} \right \rfloor & \mbox{if } m = 1 \mbox{ and } 1 \le k \le 62 \\ \left \lfloor \frac{k + 385}{2} \right \rfloor & \mbox{if } m = 1 \mbox{ and } 63 \le k \le 94 \\ \left \lfloor \frac{k + 479}{2} \right \rfloor - \left \lfloor \frac{k}{8} \right \rfloor \times 3 & \mbox{if } m = 2 \mbox{ and } k = 1, 3, 4, 5, 8, 12, 13, 14, 15 \\ \left \lfloor \frac{k + 411}{2} \right \rfloor & \mbox{if } m = 2 \mbox{ and } 78 \le k \le 94 \end{cases}</math> :<math>s_2 = \begin{cases} t + 63 & \mbox{if } k \mbox{ is odd and } 1 \le t \le 63 \\ t + 64 & \mbox{if } k \mbox{ is odd and } 64 \le t \le 94 \\ t + 158 & \mbox{if } k \mbox{ is even } \end{cases}</math> In the above, <math>s_1 s_2</math> is a two-byte Shift_JIS-2004 sequence, <math>m</math> is the {{Nihongo|plane|面|men|surface}} number (1 or 2), <math>k</math> is the {{Nihongo|row|区|ku|ward}} number (1-94) and <math>t</math> is the {{Nihongo|cell|点|ten|point}} number (1-94). The ''ku'' and ''ten'' numbers are equivalent to <math>j_1 - 32</math> and <math>j_2 - 32</math> respectively, where <math>j_1 j_2</math> is a two-byte JIS sequence referencing a given plane. The same set of characters can be represented by [[EUC-JIS-2004]], the EUC-JP based counterpart. Some of the additions collide with popular Shift JIS extensions, including Windows codepage 932 which is used in web standards (see [[#Windows-932 / Windows-31J|above]]). For example, compare plane 1 row 89 in {{nowrap|JIS X 0213}} (beginning 硃, 硎, 硏...)<ref>{{cite iso-ir |number=233 |title=Japanese Graphic Character Set for Information Interchange, Plane 1 |sponsor=Japanese Industrial Standards Committee |sponsor-link=Japanese Industrial Standards Committee |date=2004-04-13}}</ref> to row 89 in the JIS X 0208 variant defined in web standards (beginning 纊, 褜, 鍈...).<ref>{{cite web | url=https://encoding.spec.whatwg.org/jis0208.html | title=Index jis0208 visualization | publisher=WHATWG | work=Encoding Standard}}</ref> In addition, some of the characters map to Unicode characters beyond the BMP. === Other variants === {{further|Implementation of emojis#JIS, Shift_JIS and Private Use Area encodings}} The space with lead bytes 0xF5 to 0xF9 (beyond the region used for JIS X 0208) is used by Japanese [[mobile phone]] operators for [[Emoji|pictographs]] for use in [[E-mail]].<ref>{{cite web | url=https://www.fileformat.info/info/emoji/docomo.htm | title=Original Emoji from DoCoMo | publisher=FileFormat.info}}</ref> [[KDDI]] goes further and defines hundreds more in the space with lead bytes 0xF3 and 0xF4.<ref>{{cite web | url=https://www.fileformat.info/info/emoji/kddi.htm | title=Original Emoji from KDDI | publisher=FileFormat.info}}</ref> Beyond even this, there have been numerous minor variations made on Shift JIS, with individual characters here and there altered. Most of these extensions and variants have no [[Internet Assigned Numbers Authority|IANA]] registration, so there is much scope for confusion, if the extensions are used. A variant is the one that must be used if wanting to encode Shift JIS in source code [[String (computer science)|strings]] of [[C (programming language)|C]] and similar programming languages. This variant doubles the byte 0x5C if it appears as second byte of a two-byte character, but not if it appears as a single "¥" (ASCII: "\") character, because 0x5C is the beginning of an [[escape sequence]]. The best way of handling this is a special editor which encodes {{nowrap|Shift JIS}} this way.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)