Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
UTF-EBCDIC
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Character encoding for Unicode compatible with EBCDIC}}{{Infobox character encoding | name = UTF-EBCDIC | encodes = [[Unicode]] | basedon = [[UTF-8]] | by = [[IBM]] | definitions = [https://www.unicode.org/reports/tr16/tr16-8.html Unicode Technical Report #16] }} '''UTF-EBCDIC''' is a [[character encoding]] capable of encoding all 1,112,064 valid character [[code point]]s in [[Unicode]] using 1 to 5 [[byte]]s (in contrast to a maximum of 4 for [[UTF-8]]).<ref>{{Cite web|title=UTR #16: UTF-EBCDIC|url=https://www.unicode.org/reports/tr16/tr16-8.html|quote=You need to search at most five bytes (seven bytes, if the full range of 31 bits of ISO/IEC 10646 is considered) backwards|access-date=2021-02-23|website=www.unicode.org}}</ref> It is meant to be [[EBCDIC]]-friendly, so that legacy EBCDIC applications on [[Mainframe computer|mainframes]] may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to [[UTF-8]]'s advantages for existing [[ASCII]]-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16. To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points {{tt|U+0080}} through {{tt|U+009F}} (the [[C1 control code]]s) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses {{tt|101xxxxx}} instead of {{tt|10xxxxxx}} as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above {{tt|U+03FF}} are larger than the UTF-8 encoding. The UTF-8-Mod transformation leaves the data in an ASCII-based format (for example, {{tt|U+0041}} "A" is still encoded as {{tt|0x41}}), so each byte is fed through a reversible (one-to-one) lookup table to produce the final UTF-EBCDIC encoding. For example, {{tt|0x41}} in this table maps to {{tt|0xC1}}; thus the UTF-EBCDIC encoding of {{tt|U+0041}} (Unicode's "A") is {{tt|0xC1}} (EBCDIC's "A"). UTF-EBCDIC is rarely used, even on the EBCDIC-based mainframes for which it was designed. [[IBM]] EBCDIC-based mainframe operating systems, such as [[z/OS]], usually use [[UTF-16]] for complete Unicode support. For example, [[IBM Db2]], [[COBOL]], [[PL/I]], [[Java (programming language)|Java]] and the [[IBM]] [[XML]] toolkit support UTF-16 on IBM mainframes.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)