Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Shift JIS
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Structure == Shift JIS is an extension of the single-byte encoding {{nowrap|[[JIS X 0201]]:1997}}, that uses unassigned code points in {{nowrap|JIS X 0201}} to encode the double-byte {{nowrap|[[JIS X 0208]]:1997}} character set. The lead bytes for the double-byte characters are "shifted" around the 64 halfwidth [[katakana]] characters in the single-byte range [[JIS X 0201#Encoded Katakana|0xA1 to 0xDF]]. The single-byte characters [[Hexadecimal|0x]]00 to 0x7F match the [[ASCII]] encoding, except for a [[Japanese yen|yen]] sign (U+00A5) at 0x5C and an [[overline]] (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively (these deviations from ASCII align with {{nowrap|JIS X 0201}}). The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in {{nowrap|JIS X 0201}}. For double-byte characters, the first byte is always in the range 0x81 to 0x9F or the range 0xE0 to 0xEF (these ranges are unassigned in {{nowrap|JIS X 0201}}). If the first byte is odd, the second byte must be in the range 0x40 to 0x9E (but cannot be 0x7F); if the first byte is even, the second byte must in the range 0x9F to 0xFC. Shift JIS only guarantees that the first byte of two-byte characters will be high-bit-set (0x80β0xFF); the value of the second byte can be either high or low. The appearance of byte values 0x40β0x7E as second bytes of [[Code word (communication)|code word]]s makes reliable Shift JIS detection difficult, because the same codes are used for ASCII characters. Since the same byte value can be either first or second byte, string searches are difficult, since simple searches can match the second byte of a character and the first byte of the next, which is not a valid Shift JIS character. [[String-searching algorithm]]s must be tailor-made for {{nowrap|Shift JIS}}. === Compatibility === Shift JIS is fully [[backward compatibility|backwards compatible]] with the {{nowrap|[[JIS X 0201]]}} [[single-byte encoding]], meaning that any valid {{nowrap|JIS X 0201}} string is also a valid Shift JIS string. Double-byte characters in {{nowrap|JIS X 0208}} need to be transformed in order to be encoded in Shift JIS. For a double-byte JIS X 0208 sequence <math>j_1 j_2</math>,{{efn|In JIS X 0208, ''j''<sub>1</sub> and ''j''<sub>2</sub> are each in the range 33 (0x21) to 126 (0x7e) inclusive (i.e., 7-bit character values excluding control characters (0–31 (0x1f) and 127 (0x7f)) and space).}} the transformation to the corresponding Shift JIS bytes <math>s_1 s_2</math> is: :<math>s_1 = \begin{cases} \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 112 & \mbox{if } 33 \le j_1 \le 94 \\ \left \lfloor \frac{j_1 + 1}{2} \right \rfloor + 176 & \mbox{if } 95 \le j_1 \le 126 \end{cases}</math> :<math>s_2 = \begin{cases} j_2 + 31 + \left \lfloor \frac{j_2}{96} \right \rfloor & \mbox{if } j_1 \mbox{ is odd }\\ j_2 + 126 & \mbox{if } j_1 \mbox{ is even } \end{cases}</math> The competing 8-bit format [[Extended Unix Code#EUC-JP|EUC-JP]], which does not support single-byte halfwidth katakana, allows for a cleaner and more direct conversion to and from JIS X 0208 [[code point]]s, as all high-bit-set bytes are parts of a double-byte character and all codes from ASCII range represent single-byte characters.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)