Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Code page
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== The code page numbering system == IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest (first) numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's [[extended ASCII]] encoding as used in its PC hardware. With the release of [[PC DOS]] version 3.3 (and the near identical [[MS-DOS]] 3.3) IBM introduced the code page numbering system to regular PC users, as the code page numbers (and the phrase "code page") were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.<ref name="Duncan_1988_MS-DOS_Encyclopedia"/> [[File:IBM CJK Code Page Numbers.svg|right|thumb|IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft's use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.]] After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one third-party vendor ([[Oracle Corporation|Oracle]]) also has its own different list of numeric assignments.<ref name="oracle.com"/> IBM's current assignments are listed in their [[CCSID]] repository, while Microsoft's assignments are documented within the [[MSDN]].<ref name="Microsoft_Codepage-ID"/> Additionally, a list of the names and approximate IANA ([[Internet Assigned Numbers Authority]]) abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as [[Internet Explorer]]). Most well-known code pages, excluding those for the [[CJK characters|CJK]] languages and [[Vietnamese language|Vietnamese]],<!-- not using CJKV here because this applies to the modern romanised Vietnamese --> fit all their code-points into eight bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved. The text mode of standard ([[VGA-compatible text mode|VGA-compatible]]) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the [[display adapter]] for easy switching.<ref name="VGA-Programming"/> There was a selection of third-party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages. === Relationship to ASCII === The majority of code pages in current use are supersets of [[ASCII]], a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a [[parity bit]] in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors (including IBM) used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these "extended ASCII character sets" and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings. === Relationship to Unicode === Unicode is an effort to include all characters from all currently and historically used human languages into single character enumeration (effectively one large single code page), removing the need to distinguish between different code pages when handling digitally stored text. Unicode tries to retain backwards compatibility with many legacy code pages, copying some code pages 1:1 in the design process. An explicit design goal of Unicode was to allow round-trip conversion between all common legacy code pages, although this goal has not always been achieved. Some vendors, namely IBM and Microsoft, have anachronistically assigned code page numbers to Unicode encodings. This convention allows code page numbers to be used as metadata to identify the correct decoding algorithm when encountering binary stored data.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)