Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
ISO/IEC 2022
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Higher-level 7-bit and 8-bit character encoding system}} {{Distinguish|ISO 20022}} {{Use Oxford spelling|date=December 2011}} {{Infobox character encoding | name = ISO 2022 | mime = | alias = | standard = {{hlist|[[ISO/IEC JTC 1|ISO/IEC]] 2022|[[Ecma International|ECMA]]-35|[[ANSI]] X3.41|[[Japanese Industrial Standards|JIS]] X 0202|[[Guobiao|GB/T]] 2311}} | lang = Various. | extends = | encodes = [[US-ASCII]] and, depending on implementation: {{hlist|[[GB 2312]]|[[JIS X 0201]]|[[JIS X 0208]]|[[JIS X 0212]]|[[JIS X 0213]]|[[KS X 1001]]|[[CNS 11643]]|[[ISO/IEC 646]]|[[ISO/IEC 8859]] / [[ISO/IEC 10367|10367]]|''various others''}} | status = | prev = | next = [[ISO/IEC 10646]] ([[Unicode]]) | classification = [[State (computer science)|Stateful]] system of [[Character encoding|encodings]] (with stateless pre-configured subsets) | otherrelated = '''Stateful subsets''': {{hlist|[[ISO-2022-JP]]|[[ISO-2022-CN]]|[[ISO-2022-KR]]|[[Compound Text]]}} {{hr}} '''Pre-configured versions''': {{hlist|[[ISO/IEC 4873]]|[[Extended Unix Code|EUC]]}} }} '''ISO/IEC 2022''' ''Information technology—Character code structure and extension techniques'', is an [[International Organization for Standardization|ISO]]/[[International Electrotechnical Commission|IEC]] standard in the field of [[character encoding]]. It is equivalent to the [[Ecma International|ECMA]] standard '''ECMA-35''',<ref>{{harvp|ECMA-35|1994|loc=Brief History}}</ref><ref>{{harvp|ECMA-35|1994|p=51|loc=annex D}}</ref> the [[ANSI]] standard '''ANSI X3.41'''<ref name="marc-escs" /> and the [[Japanese Industrial Standard]] '''JIS X 0202'''. Originating in 1971, it was most recently revised in 1994.<ref>{{cite web |url=https://www.ecma-international.org/publications-and-standards/standards/ecma-35/ |title=ECMA-35: Character code structure and extension techniques (web page) |publisher=[[Ecma International]] |access-date=2022-04-27 |archive-date=2022-04-25 |archive-url=https://web.archive.org/web/20220425142341/https://www.ecma-international.org/publications-and-standards/standards/ecma-35/ |url-status=live }}</ref> ISO 2022 specifies a general structure which character encodings can conform to, dedicating particular ranges of bytes ([[Hexadecimal|0x]]00–1F and 0x7F–9F) to be used for non-printing [[C0 and C1 control codes|control codes]]<ref name="8.1"/> for formatting and [[in-band]] instructions (such as [[newline|line breaks]] or formatting instructions for [[text terminal]]s), rather than [[graphic character|graphical characters]]. It also specifies a syntax for escape sequences, multiple-byte sequences beginning with the {{ctrl|ESC}} control code, which can likewise be used for in-band instructions.<ref name="ch13"/> Specific sets of control codes and escape sequences designed to be used with ISO 2022 include [[ISO/IEC 6429]], portions of which are implemented by [[ANSI.SYS]] and [[terminal emulator]]s. ISO 2022 itself also defines particular control codes and escape sequences which can be used for switching between different [[coded character set]]s (for example, between [[ASCII]] and the Japanese [[JIS X 0208]]) so as to use multiple in a single document,<ref name="ch12_14"/> effectively combining them into a single [[state (computer science)|stateful]] encoding (a feature less important since the advent of [[Unicode]]). It is designed to be usable in both 8-bit environments and 7-bit environments (those where only seven bits are usable in a byte, such as [[e-mail]] without [[8BITMIME]]).<ref name="ch11"/> ==Encodings and conformance== The ASCII character set supports the [[ISO Basic Latin alphabet]] (equivalent to the [[English alphabet]]), and does not provide good support for languages which use additional letters, or which use a different [[writing system]] altogether. Other writing systems with relatively few characters, such as [[Greek script|Greek]], [[Cyrillic]], [[Arabic script|Arabic]] or [[Hebrew alphabet|Hebrew]], as well as forms of the [[Latin script]] using [[diacritic]]s or letters absent from the ISO Basic Latin alphabet, have historically been represented on [[personal computer]]s with different 8-[[bit]], [[SBCS|single byte]], [[extended ASCII]] encodings, which follow ASCII when the [[most significant bit]] is 0 (i.e. bytes 0x00–7F, when represented in [[hexadecimal]]), and include additional characters for a most significant bit of 1 (i.e. bytes 0x80–FF). Some of these, such as the [[ISO 8859]] series, conform to ISO 2022,<ref name="8859-10-s1"/><ref name="ecma-144-s1"/> while others such as [[Code page 437|DOS code page 437]] do not, usually due to not reserving the bytes 0x80–9F for control codes. Certain [[East Asian]] languages, specifically [[Chinese language|Chinese]], [[Japanese language|Japanese]], and [[Korean language|Korean]] (collectively "[[CJK characters|CJK]]"), are written using far more characters than the maximum of 256 which can be represented in a single byte, and were first represented on computers with language-specific [[double-byte character set|double-byte encodings]] or [[variable-width encoding]]s; some of these (such as the [[Simplified Chinese]] encoding {{nowrap|[[GB 2312]]}}) conform to {{nowrap|ISO 2022}}, while others (such as the [[Traditional Chinese]] encoding [[Big5]]) do not. Control codes in ISO 2022 are always represented with a single byte, regardless of the number of bytes used for graphical characters. CJK encodings used in 7-bit environments which use {{nowrap|ISO 2022}} mechanisms to switch between character sets are often given names starting with "ISO-2022-", most notably [[#ISO-2022-JP|ISO-2022-JP]], although some other CJK encodings such as [[EUC-JP]] also make use of ISO 2022 mechanisms.<ref name="lundeeuc"/><ref name="lundeeucvs"/> Since the first 256 [[code point]]s of [[Unicode]] were taken from [[ISO 8859-1]], Unicode inherits the concept of [[C0 and C1 control codes]] from ISO 2022, although it adds [[Unicode control characters|other non-printing characters]] besides the ISO 2022 control codes. However, [[Unicode transformation format]]s such as [[UTF-8]] generally deviate from the ISO 2022 structure in various ways, including: * Using 8-bit bytes, but not representing the C1 codes in their single-byte forms specified in ISO 2022 (most UTFs, one exception being the obsolete [[UTF-1]]) * Representing all characters, including control codes, with multiple bytes (e.g. [[UTF-16]], [[UTF-32]]) * Mixing bytes with the [[most significant bit]] set and unset within the coded representation for a single code point (e.g. UTF-1, {{nowrap|[[GB 18030]]}}) ISO 2022 escape sequences do, however, exist for switching to and from UTF-8 as a "[[#Interaction with other coding systems|coding system different from that of ISO 2022]]",<ref name="iso-ir-196"/> which are supported by certain [[terminal emulator]]s such as [[xterm]].<ref name="xtctrlesc"/> ==Overview== ===Elements=== ISO/IEC 2022 specifies the following: * An infrastructure of multiple character sets with particular structures which may be included in a single [[character encoding]] system, including multiple graphical character sets and multiple sets of both [[C0 and C1 control codes|primary (C0) and secondary (C1) control codes]],<ref>{{harvp|ECMA-35|1994|loc=chapters 6, 7}}</ref> * A format for encoding these sets, assuming that 8 bits are available per byte,<ref>{{harvp|ECMA-35|1994|loc=chapter 8}}</ref> * A format for encoding these sets in the same encoding system when only 7 bits are available per byte,<ref>{{harvp|ECMA-35|1994|loc=chapter 9}}</ref> and a method for transforming any conformant character data to pass through such a 7-bit environment,<ref name="ch11">{{harvp|ECMA-35|1994|loc=chapter 11}}</ref> * The general structure of [[ANSI escape codes]],<ref name="ch13">{{harvp|ECMA-35|1994|loc=chapter 13}}</ref> and * Specific escape code formats for identifying individual character sets,<ref name="ch12_14">{{harvp|ECMA-35|1994|loc=chapters 12, 14}}</ref> for announcing the use of particular encoding features or subsets,<ref name="ch15">{{harvp|ECMA-35|1994|loc=chapter 15}}</ref> and for interacting with or switching to other encoding systems.<ref name="ch15" /> ===Code versions=== {{further|#ISO/IEC 2022 code versions}} A specific implementation does not have to implement all of the standard; the conformance level and the supported character sets are defined by the implementation. Although many of the mechanisms defined by the ISO/IEC 2022 standard are infrequently used, several established encodings are based on a subset of the ISO/IEC 2022 system.<ref name="lunde2022">{{harvp|Lunde|2008|pp=228-234|loc=Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding"}}</ref> In particular, 7-bit encoding systems using ISO/IEC 2022 mechanisms include [[ISO-2022-JP]] (or [[JIS encoding]]), which has primarily been used in Japanese-language [[e-mail]]. 8-bit encoding systems conforming to ISO/IEC 2022 include [[ISO/IEC 4873]] (ECMA-43), which is in turn conformed to by [[ISO/IEC 8859]],<ref name="8859-10-s1"/><ref name="ecma-144-s1"/> and [[Extended Unix Code]], which is used for [[East Asia]]n languages.<ref name="lundeeuc">{{harvp|Lunde|2008|pp=242-245|loc=Chapter 4 ("Encoding Methods"), section "EUC encoding"}}</ref> More specialised applications of ISO 2022 include the [[MARC-8]] encoding system used in [[MARC 21]] library records.<ref name="marc-escs" /> ===Designation escape sequences=== {{further|#Registration of graphical and control code sets|#Character set designations}} The escape sequences for switching to particular character sets or encodings are registered with the [[#ISO-IR|ISO-IR]] registry (except for those set apart for private use, the meanings of which are defined by vendors, or by protocol specifications such as [[ARIB STD B24 character set|ARIB STD-B24]]) and follow the patterns defined within the standard. Character encodings making use of these escape sequences require data to be processed sequentially in a forward direction, since the correct interpretation of the data depends on previously encountered escape sequences. Specific profiles such as ISO-2022-JP may impose extra conditions, such as that the current character set is reset to US-ASCII before the end of a line. Furthermore, the escape sequences declaring the national character sets may be absent if a specific ISO-2022-based encoding permits or requires this, and dictates that particular national character sets are to be used. For example, ISO-8859-1 states that no defining escape sequence is needed. ===Multi-byte characters=== {{further|JIS X 0208#Code points and code numbers}} To represent large character sets, ISO/IEC 2022 builds on [[ISO/IEC 646]]'s property that a seven-bit character representation will normally be able to represent 94 graphic (printable) characters (in addition to space and 33 control characters); if only the C0 control codes (narrowly defined) are excluded, this can be expanded to 96 characters. Using two bytes, it is thus possible to represent up to 8,836 (94×94) characters; and, using three bytes, up to 830,584 (94×94×94) characters. Though the standard defines it, no registered character set uses three bytes (although [[Extended Unix Code#EUC-TW|EUC-TW]]'s unregistered G2 does, as does the similarly unregistered [[CCCII]]). For the two-byte character sets, the [[code point]] of each character is normally specified in so-called ''row-cell'' or ''[[kuten]]''{{efn|{{langx|ja|区点|kuten}}; {{lang-zh|s=区位|t=區位|p=qūwèi}}; {{korean|hangul=행렬|hanja=行列|rr=haeng-nyeol}}}} form, which comprises two numbers between 1 and 94 inclusive, specifying a row{{efn|{{CJKV|s=区|t=區|r=ku|p=qū|l=zone}}; {{korean|hangul=행|hanja=行|rr=haeng}}}} and cell{{efn|{{langx|ja|点|ten|lit=point}}; {{lang-zh|c=位|p=wèi|l=position}}; {{korean|열|hanja=列|rr=yeol}}}} of that character within the zone. For a three-byte set, an additional ''plane''{{efn|{{langx|ja|面|men|lit=face}}}} number is included at the beginning.<ref name="lundekuten">{{harvp|Lunde|2008|pp=19-20|loc=Chapter 1 ("CJKV Information Processing Overview"), section "What are Row-Cell and Plane-Row-Cell?"}}</ref> The escape sequences do not only declare which character set is being used, but also whether the set is single-byte or multi-byte (although not how many bytes it uses if it is multi-byte), and also whether each byte has 94 or 96 permitted values. ==Code structure== ===Notation and nomenclature=== ISO/IEC 2022 coding specifies a two-layer mapping between character codes and displayed characters. [[Escape sequence]]s allow any of a large registry of graphic character sets to be "designated"<ref>{{harvp|ECMA-35|1994|p=4|loc=definition 4.11}}</ref> into one of four working sets, named G0 through G3, and shorter control sequences specify the working set that is "invoked"<ref>{{harvp|ECMA-35|1994|p=5|loc=definition 4.18}}</ref> to interpret bytes in the stream. Encoding byte values ("bit combinations") are often given in [[JIS X 0208#Single byte codes|column-line notation]], where two decimal numbers in the range 00–15 (each corresponding to a single hexadecimal digit) are separated by a slash.<ref>See, for instance, {{harvp|ISO-IR-14|1975}}, defining the G0 designation of the [[JISCII|JIS X 0201 Roman set]] as <code>ESC 2/8 4/10</code>.</ref> Hence, for instance, codes 2/0 (0x20) through 2/15 (0x2F) inclusive may be referred to as "column 02". This is the notation used in the ISO/IEC 2022 / ECMA-35 standard itself.<ref>{{harvp|ECMA-35|1994|p=5|loc=chapter 5.1}}</ref> They may be described elsewhere using [[hexadecimal]], as is often used in this article, or using the corresponding ASCII characters,<ref>See, for instance, {{harvp|RFC 1468|1993}}, defining the G0 designation of the [[JISCII|JIS X 0201 Roman set]] as <code>ESC ( J</code>.</ref> although the escape sequences are actually defined in terms of byte values, and the graphic assigned to that byte value may be altered without affecting the control sequence. Byte values from the 7-bit ASCII graphic range (hexadecimal 0x20–0x7F), being on the left side of a character code table, are referred to as "GL" codes ''(with "GL" standing for "graphics left")'' while bytes from the "high ASCII" range (0xA0–0xFF), if available (i.e. in an 8-bit environment), are referred to as the "GR" codes ''("graphics right")''.<ref name="8.1">{{harvp|ECMA-35|1994|pp=15–16|loc=chapter 8.1}}</ref> The terms "CL" (0x00–0x1F) and "CR" (0x80–0x9F) are defined for the control ranges, but the CL range always invokes the primary (C0) controls, whereas the CR range always either invokes the secondary (C1) controls or is unused.<ref name="8.1"/> ===Fixed coded characters=== The [[delete character]] DEL (0x7F), the [[escape character]] ESC (0x1B) and the [[space character]] SP (0x20) are designated "fixed" coded characters<ref>{{harvp|ECMA-35|1994|p=7|loc=chapter 6.2}}</ref> and are always available when G0 is invoked over GL, irrespective of what character sets are designated. They may not be included in graphical character sets, although other sizes or types of [[whitespace character]] may be.<ref>{{harvp|ECMA-35|1994|p=10|loc=chapter 6.3.2}}</ref> ===General syntax of escape sequences=== Sequences using the ESC (escape) character take the form <code>ESC [{{var|I}}...] {{var|F}}</code>, where the ESC character is followed by zero or more intermediate bytes<ref>{{harvp|ECMA-35|1994|p=4|loc=definition 4.17}}</ref> ({{var|{{serif|I}}}}) from the range 0x20–0x2F, and one final byte<ref>{{harvp|ECMA-35|1994|p=4|loc=definition 4.14}}</ref> ({{var|F}}) from the range 0x30–0x7E.<ref name="13.1">{{harvp|ECMA-35|1994|p=28|loc=chapter 13.1}}</ref> The first {{var|{{serif|I}}}} byte, or absence thereof, determines the type of escape sequence; it might, for instance, designate a working set, or denote a single control function. In all types of escape sequences, {{var|F}} bytes in the range 0x30–0x3F are reserved for unregistered private uses defined by prior agreement between parties.<ref name="13.3.3">{{harvp|ECMA-35|1994|p=33|loc=chapter 13.3.3}}</ref> Control functions from some sets may make use of further bytes following the escape sequence proper. For example, the [[ISO 6429]] control function "{{ctrl|CSI|Control Sequence Introducer}}", which can be represented using an escape sequence, is followed by zero or more bytes in the range 0x30–0x3F, then zero or more bytes in the range 0x20–0x2F, then by a single byte in the range 0x40–0x7E, the entire sequence being called a "control sequence".<ref>{{harvp|ECMA-48|1991|pp=24-26|loc=chapter 5.4}}</ref> ===Graphical character sets=== Each of the four working sets G0 through G3 may be a 94-character set or a 94<sup>n</sup>-character [[Multi Byte Character Set|multi-byte set]]. Additionally, G1 through G3 may be a 96- or 96<sup>n</sup>-character set. In a 96- or 96<sup>n</sup>-character set, the bytes 0x20 through 0x7F when GL-invoked, or 0xA0 through 0xFF when GR-invoked, are allocated to and may be used by the set. In a 94- or 94<sup>n</sup>-character set, the bytes 0x20 and 0x7F are not used.<ref name="6.4.3"/> When a 96- or 96<sup>n</sup>-character set is invoked in the GL region, the space and delete characters (codes 0x20 and 0x7F) are not available until a 94- or 94<sup>n</sup>-character set (such as the G0 set) is invoked in GL.<ref name="8.1"/> 96-character sets cannot be designated to G0. Registration of a set as a 96-character set does not necessarily mean that the 0x20/A0 and 0x7F/FF bytes are actually assigned by the set; some examples of graphical character sets which are registered as 96-sets but do not use those bytes include the G1 set of [[I.S. 434]],<ref>{{harvp|ISO-IR-208|1999}}</ref> the box drawing set from [[ISO/IEC 10367]],<ref>{{harvp|ISO-IR-155|1990}}</ref> and ISO-IR-164 (a subset of the G1 set of [[ISO-8859-8]] with only the letters, used by [[CCITT]]).<ref>{{harvp|ISO-IR-164|1992}}</ref> ===Combining characters=== Characters are expected to be spacing characters, not combining characters, unless specified otherwise by the graphical set in question.<ref name="6.3.3">{{harvp|ECMA-35|1994|p=10|loc=chapter 6.3.3}}</ref> ISO 2022 / ECMA-35 also recognizes the use of the [[C0 and C1 control codes#BS|backspace]] and carriage return control characters as means of combining otherwise spacing characters, as well as the [[ANSI escape code#CSIsection|CSI sequence]] "Graphic Character Combination" (GCC)<ref name="6.3.3" /> (<code>CSI 0x20 (SP) 0x5F (_)</code>).<ref>{{cite web |work=ANSI escape sequence library for Go |url=https://github.com/pborman/ansi/blob/8707152fc11286b76d2377dbd6256b8237443948/ansi.go#L134 |title=ansi.go, line 134 |author=((Google Inc.)) <!-- Yes, this is really a citation to open source code released by Google, the company (not merely found in a Google search), check e.g. the source code copyright notice. Albeit, it was exported from the defunct Google Code to Github by a third party. --> |author-link=Google, Inc. |date=2014 |access-date=2019-09-14 |archive-date=2022-04-30 |archive-url=https://web.archive.org/web/20220430005007/https://github.com/pborman/ansi/blob/8707152fc11286b76d2377dbd6256b8237443948/ansi.go#L134 |url-status=live }}</ref> Use of the backspace and carriage return in this manner is permitted by [[ISO/IEC 646]] but prohibited by [[ISO/IEC 4873]] / ECMA-43<ref>{{harvp|ECMA-43|1991|loc=chapter 7 ("Specification of the characters of the 8-bit code")|p=5}}</ref> and by [[ISO/IEC 8859]],<ref name="8859-10-s6">{{harvp|ISO/IEC FDIS 8859-10|1998|loc=chapter 6 ("Specification of the coded character set")|p=3}}</ref><ref name="ecma-144-s6">{{harvp|ECMA-144|2000|loc=chapter 6 ("Specification of the coded character set")|p=3}}</ref> on the basis that it leaves the graphical character repertoire undefined. ISO/IEC 4873 / ECMA-43 does, however, permit the use of the GCC function provided that the sequence of characters is kept the same and merely displayed in one space, rather than being over-stamped to form a character with a different meaning.<ref>{{harvp|ECMA-43|1991|loc=annex C ("Composite graphic characters")|p=19}}</ref> ===Control character sets=== Control character sets are classified as "primary" or "secondary" control code sets,<ref name="6.4.1">{{harvp|ECMA-35|1994|p=10|loc=chapter 6.4.1}}</ref> respectively also called "C0" and "C1" control code sets.<ref name="6.4.4">{{harvp|ECMA-35|1994|p=11|loc=chapter 6.4.4}}</ref> A C0 control set must contain the ESC (escape) control character at 0x1B<ref name="6.4.2">{{harvp|ECMA-35|1994|p=11|loc=chapter 6.4.2}}</ref> (a C0 set containing only ESC is registered as ISO-IR-104),<ref>{{harvp|ISO-IR-104|1985}}</ref> whereas a C1 control set may not contain the escape control whatsoever.<ref name="6.4.3">{{harvp|ECMA-35|1994|p=11|loc=chapter 6.4.3}}</ref> Hence, they are entirely separate registrations, with a C0 set being only a C0 set and a C1 set being only a C1 set.<ref name="6.4.4" /> If codes from the C0 set of ISO 6429 / ECMA-48, i.e. the [[C0 and C1 control codes#C0 controls|ASCII control codes]], appear in the C0 set, they are required to appear at their ISO 6429 / ECMA-48 locations.<ref name="6.4.2" /> Inclusion of transmission control characters in the C0 set, besides the ten included by ISO 6429 / ECMA-48 (namely SOH, STX, ETX, EOT, ENQ, ACK, DLE, NAK, SYN and ETB),<ref>{{harvp|ISO-IR-1|1975}}</ref> or inclusion of any of those ten in the C1 set, is also prohibited by the ISO/IEC 2022 / ECMA-35 standard.<ref name="6.4.2" /><ref name="6.4.3" /> A C0 control set is invoked over the CL range 0x00 through 0x1F,<ref name="8.5.1">{{harvp|ECMA-35|1994|p=19|loc=chapter 8.5.1}}</ref> whereas a C1 control function may be invoked over the CR range 0x80 through 0x9F (in an 8-bit environment) or by using escape sequences (in a 7-bit or 8-bit environment),<ref name="6.4.1" /> but not both. Which style of C1 invocation is used must be specified in the definition of the code version.<ref name="8.5.2">{{harvp|ECMA-35|1994|p=19|loc=chapter 8.5.2}}</ref> For example, ISO/IEC 4873 specifies CR bytes for the C1 controls which it uses (SS2 and SS3).<ref name="ecma-43-7.6">{{harvp|ECMA-43|1991|loc=chapter 7.6 ("C1 set")|p=8}}</ref> If necessary, which invocation is used may be communicated using [[#Code structure announcements|announcer sequences]]. In the latter case, single control functions from the C1 control code set are invoked using "type Fe" escape sequences,<ref name="6.4.3"/> meaning those where the ESC control character is followed by a byte from columns 04 or 05 (that is to say, <code>ESC 0x40 (@)</code> through <code>ESC 0x5F (_)</code>).<ref name="13.12.1">{{harvp|ECMA-35|1994|p=29|loc=chapter 13.2.1}}</ref> ===Other control functions=== Additional control functions are assigned to "type Fs" escape sequences (in the range <code>ESC 0x60 (`)</code> through <code>ESC 0x7E (~)</code>); these have permanently assigned meanings rather than depending on the C0 or C1 designations.<ref name="13.12.1" /><ref name="6.5.1">{{harvp|ECMA-35|1994|p=12|loc=chapter 6.5.1}}</ref> Registration of control functions to type "Fs" sequences must be approved by [[ISO/IEC JTC 1/SC 2]].<ref name="6.5.1" /> Other single control functions may be registered to type "3Ft" escape sequences (in the range <code>ESC 0x23 (#) [{{var|I}}...] 0x40 (@)</code> through <code>ESC 0x23 (#) [{{var|I}}...] 0x7E (~)</code>),<ref name="6.5.2">{{harvp|ECMA-35|1994|p=12|loc=chapter 6.5.2}}</ref> although no "3Ft" sequences are currently assigned (as of 2019).<ref name="irfixctrl">{{harvp|ISO-IR|loc=chapter 2.7 ("Single control functions")|p=19}}</ref> Some of these are specified in ECMA-35 (ISO 2022 / ANSI X3.41), others in ECMA-48 (ISO 6429 / ANSI X3.64).<ref name="6.5.4">{{harvp|ECMA-35|1994|p=12|loc=chapter 6.5.4}}</ref> ECMA-48 refers to these as "independent control functions".<ref>{{harvp|ECMA-48|1991|loc=chapter 5.5}}</ref> {| class="wikitable" |- ! Code !! Hex !! Abbr. !! Name !! Effect<ref name="irfixctrl" /> |- id="DMI" | <code>ESC `</code> || <code>1B 60</code> || DMI || Disable manual input || Disables some or all of the manual input facilities of the device. |- id="INT" | <code>ESC a</code> || <code>1B 61</code> || INT || Interrupt || Interrupts the current process. |- id="EMI" | <code>ESC b</code> || <code>1B 62</code> || EMI || Enable manual input || Enables the manual input facilities of the device. |- id="RIS" | <code>ESC c</code> || <code>1B 63</code> || RIS || Reset to initial state || The device's display and input subsystems revert to the same state as when it's just been powered on.<ref name="ris">{{cite iso-ir |number=35 |sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |date=1976-12-30 |title=Reset to Initial State (RIS)}}</ref> Connections to clients are unaffected. |- | <code>ESC d</code> || <code>1B 64</code> || CMD || Coding method delimiter || Used when interacting with an outer coding / representation system, [[#Interaction with other coding systems|see below.]] |- | <code>ESC n</code> || <code>1B 6E</code> || LS2 || Locking shift two || Shift function, [[#Shift functions|see below.]] |- | <code>ESC o</code> || <code>1B 6F</code> || LS3 || Locking shift three || Shift function, [[#Shift functions|see below.]] |- | <code>ESC |</code> || <code>1B 7C</code> || LS3R || Locking shift three right || Shift function, [[#Shift functions|see below.]] |- | <code>ESC }</code> || <code>1B 7D</code> || LS2R || Locking shift two right || Shift function, [[#Shift functions|see below.]] |- | <code>ESC ~</code> || <code>1B 7E</code> || LS1R || Locking shift one right || Shift function, [[#Shift functions|see below.]] |} Escape sequences of type "Fp" (<code>ESC 0x30 (0)</code> through <code>ESC 0x3F (?)</code>) or of type "3Fp" (<code>ESC 0x23 (#) [{{var|I}}...] 0x30 (0)</code> through <code>ESC 0x23 (#) [{{var|I}}...] 0x3F (?)</code>) are reserved for single private use control codes, by prior agreement between parties.<ref name="6.5.3">{{harvp|ECMA-35|1994|p=12|loc=chapter 6.5.3}}</ref> Several such sequences of both types are used by [[Digital Equipment Corporation|DEC]] terminals such as the [[VT100]], and are thus supported by [[terminal emulator]]s.<ref name="xtctrlesc" /> ===Shift functions=== By default, GL codes specify G0 characters and GR codes (where available) specify G1 characters; this may be otherwise specified by prior agreement. The set invoked over each area may also be modified with control codes referred to as shifts, as shown in the table below.<ref name="table2">{{harvp|ECMA-35|1994|p=14|loc=chapter 7.3, table 2}}</ref> An 8-bit code may have GR codes specifying G1 characters, i.e. with its corresponding 7-bit code using [[Shift In]] and [[Shift Out]] to switch between the sets (e.g. [[JIS X 0201]]),<ref>{{harvp|ISO-IR-14|1975}}</ref> although some instead have GR codes specifying G2 characters, with the corresponding 7-bit code using a single-shift code to access the second set (e.g. [[ITU T.51|T.51]]).<ref name="T.51-amd1995">{{citation |mode=cs1 |url=https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.51-199508-I!Amd1!PDF-E&type=items |title=Recommendation T.51 (1992) Amendment 1 |date=1995-08-11 |author=ITU-T |author-link=ITU-T |access-date=2019-12-25 |archive-date=2020-08-02 |archive-url=https://web.archive.org/web/20200802044255/https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-T.51-199508-I!Amd1!PDF-E&type=items |url-status=live }}</ref> The codes shown in the table below are the most common encodings of these control codes, conforming to [[C0 and C1 control codes#C1 controls|ISO/IEC 6429]]. The LS2, LS3, LS1R, LS2R and LS3R shifts are registered as single control functions and are always encoded as the escape sequences listed below,<ref name="irfixctrl" /> whereas the others are part of a C0 or C1 control code set (as shown below, SI (LS0) and SO (LS1) are C0 controls and SS2 and SS3 are C1 controls), meaning that their coding and availability may vary depending on which control sets are designated: they must be present in the designated control sets if their functionality is used.<ref name="8.5.1" /><ref name="8.5.2" /> The C1 controls themselves, as mentioned above, may be represented using escape sequences or 8-bit bytes, but not both. Alternative encodings of the single-shifts as C0 control codes are available in certain control code sets. For example, SS2 and SS3 are usually available at 0x19 and 0x1D respectively in [[ITU T.51|T.51]]<ref name="T.51-amd1995" /> and [[ITU T.61|T.61]].<ref name="reg106">{{harvp|ISO-IR-106|1985}}</ref> This coding is currently recommended by ISO/IEC 2022 / ECMA-35 for applications requiring 7-bit single-byte representations of SS2 and SS3,<ref>{{harvp|ECMA-35|1994|p=15|loc=chapter 7.3, note 23}}</ref> and may also be used for SS2 only,<ref name="reg140">{{harvp|ISO-IR-140|1987}}</ref> although older code sets with SS2 at 0x1C also exist,<ref name="reg7">{{harvp|ISO-IR-7|1975}}</ref><ref name="reg26">{{harvp|ISO-IR-26|1976}}</ref><ref name="reg36">{{harvp|ISO-IR-36|1977}}</ref> and were mentioned as such in an earlier edition of the standard.<ref>{{harvp|ECMA-35|1980|p=8|loc=chapter 5.1.7}}</ref> The 0x8E and 0x8F coding of the single shifts as shown below is mandatory for [[ISO/IEC 4873]] levels 2 and 3.<ref name="harvp|ISO-IR-105|1985">{{harvp|ISO-IR-105|1985}}</ref> {| class="wikitable" |- ! Code !! Hex !! Abbr. !! Name !! Effect |- id="SI" | <code>SI</code> || <code>0F</code> || SI<br>LS0 || [[Shift In]]<br>Locking shift zero || GL encodes G0 from now on<ref name="8.3.1">{{harvp|ECMA-35|1994|p=17|loc=chapter 8.3.1}}</ref><ref name="9.3.1">{{harvp|ECMA-35|1994|p=23|loc=chapter 9.3.1}}</ref> |- id="SO" | <code>SO</code> || <code>0E</code> || SO<br>LS1 || [[Shift Out]]<br>Locking shift one || GL encodes G1 from now on<ref name="8.3.1" /><ref name="9.3.1" /> |- id="LS2" | <code>ESC n</code> || <code>1B 6E</code> || LS2 || Locking shift two || GL encodes G2 from now on<ref name="8.3.1" /><ref name="9.3.1" /> |- id="LS3" | <code>ESC o</code> || <code>1B 6F</code> || LS3 || Locking shift three || GL encodes G3 from now on<ref name="8.3.1" /><ref name="9.3.1" /> |- id="SS2" | ''CR area:'' <code>SS2</code><br>''Escape code:'' <code>ESC N</code> || ''CR area:'' <code>8E</code><br>''Escape code:'' <code>1B 4E</code> || SS2 || Single shift two || GL or GR (see below) encodes G2 for the immediately following character only<ref name="8.4" /> |- id="SS3" | ''CR area:'' <code>SS3</code><br>''Escape code:'' <code>ESC O</code> || ''CR area:'' <code>8F</code><br>''Escape code:'' <code>1B 4F</code> || SS3 || Single shift three || GL or GR (see below) encodes G3 for the immediately following character only<ref name="8.4" /> |- id="LS1R" | <code>ESC ~</code> || <code>1B 7E</code> || LS1R || Locking shift one right || GR encodes G1 from now on<ref name="8.3.2">{{harvp|ECMA-35|1994|p=17|loc=chapter 8.3.2}}</ref> |- id="LS2R" | <code>ESC }</code> || <code>1B 7D</code> || LS2R || Locking shift two right || GR encodes G2 from now on<ref name="8.3.2" /> |- id="LS3R" | <code>ESC |</code> || <code>1B 7C</code> || LS3R || Locking shift three right || GR encodes G3 from now on<ref name="8.3.2" /> |} Although officially considered shift codes and named accordingly, single-shift codes are not always viewed as shifts,<ref name="lundeeucvs"/> and they may simply be viewed as prefix bytes (i.e. the first bytes in a multi-byte sequence),<ref name="lundeeuc"/> since they do not require the encoder to keep the currently active set as [[state (computer science)|state]], unlike locking shift codes. In 8-bit environments, either GL or GR, but not both, may be used as the single-shift area. This must be specified in the definition of the code version.<ref name="8.4">{{harvp|ECMA-35|1994|p=19|loc=chapter 8.4}}</ref> For instance, [[ISO/IEC 4873]] specifies GL, whereas [[Extended Unix Code|packed EUC]] specifies GR. In 7-bit environments, only GL is used as the single-shift area.<ref name="9.4">{{harvp|ECMA-35|1994|pp=23-24|loc=chapter 9.4}}</ref><ref name="11.1">{{harvp|ECMA-35|1994|p=27|loc=chapter 11.1}}</ref> If necessary, which single-shift area is used may be communicated using [[#Code structure announcements|announcer sequences]]. The names "locking shift zero" (LS0) and "locking shift one" (LS1) refer to the same pair of C0 control characters (0x0F and 0x0E) as the names "shift in" (SI) and "shift out" (SO). However, the standard refers to them as LS0 and LS1 when they are used in 8-bit environments and as SI and SO when they are used in 7-bit environments.<ref name="table2"/> The ISO/IEC 2022 / ECMA-35 standard permits, but discourages, invoking G1, G2 or G3 in both GL and GR simultaneously.<ref name="8.3.3">{{harvp|ECMA-35|1994|p=17|loc=chapter 8.3.3}}</ref> ==={{anchor|ISO-IR}}Registration of graphical and control code sets=== The ''ISO International register of coded character sets to be used with escape sequences'' (ISO-IR) lists graphical character sets, control code sets, single control codes and so forth which have been registered for use with ISO/IEC 2022. The procedure for registering codes and sets with the ISO-IR registry is specified by '''ISO/IEC 2375'''. Each registration receives a unique escape sequence, and a unique registry entry number to identify it.<ref>{{harvp|ECMA-35|1994|p=47|loc=annex B}}</ref><ref name="irintro">{{harvp|ISO-IR|loc=chapter 1 ("Introduction")|p=2}}</ref> For example, the [[ITU-T|CCITT]] character set for [[Simplified Chinese]] is known as [[ISO-IR-165]]. Registration of coded character sets with the ISO-IR registry identifies the documents specifying the character set or control function associated with an ISO/IEC 2022 non‑private-use escape sequence. This may be a standard document; however, registration does not create a new ISO standard, does not commit the ISO or IEC to adopt it as an international standard, and does not commit the ISO or IEC to add any of its characters to the [[Universal Coded Character Set]].<ref>{{harvp|ISO/IEC 2375|2003}}</ref> ISO-IR registered escape sequences are also used encapsulated in a [[Formal Public Identifier]] to identify character sets used for numeric character references in [[SGML]] (ISO 8879). For example, the string {{code|ISO 646-1983//CHARSET International Reference Version (IRV)//ESC 2/5 4/0}} can be used to identify the International Reference Version of [[ISO 646]]-1983,<ref name="sp">{{cite web |url=http://www.jclark.com/sp/sgmldecl.htm |title=Handling of the SGML declaration in SP |work=SP: an SGML System Conforming to International Standard ISO 8879}}</ref> and the [[HTML]] 4.01 specification uses {{code|ISO Registration Number 177//CHARSET ISO/IEC 10646-1:1993 UCS-4 with implementation level 3//ESC 2/5 2/15 4/6}} to identify Unicode.<ref>{{cite web |url=https://www.w3.org/TR/html401/sgml/sgmldecl.html |title=20: SGML Declaration of HTML 4 |work=HTML 4.01 Specification |publisher=[[W3C]]}}</ref> The textual representation of the escape sequence, included in the third element of the FPI, will be recognised by SGML implementations for supported character sets.<ref name="sp"/> ===Character set designations=== Escape sequences to designate character sets take the form <code>ESC {{var|I}} [{{var|I}}...] {{var|F}}</code>. As mentioned above, the intermediate ({{var|{{serif|I}}}}) bytes are from the range 0x20–0x2F, and the final ({{var|F}}) byte is from the range 0x30–0x7E. The first {{var|{{serif|I}}}} byte (or, for a multi-byte set, the first two) identifies the type of character set and the working set it is to be designated to, whereas the {{var|F}} byte (and any additional {{var|{{serif|I}}}} bytes) identify the character set itself, as assigned in the ISO-IR register (or, for the private-use escape sequences, by prior agreement). Additional {{var|{{serif|I}}}} bytes may be added before the {{var|F}} byte to extend the {{var|F}} byte range. This is currently only used with 94-character sets, where codes of the form <code>ESC ( ! {{var|F}}</code> have been assigned.<ref name="irsecond94">{{harvp|ISO-IR|loc=chapter 2.2 ("94-Character graphic character set with second Intermediate byte")|p=10}}</ref> At the other extreme, no multibyte 96-sets have been registered, so the sequences below are strictly theoretical. As with other escape sequence types, the range 0x30–0x3F is reserved for private-use {{var|F}} bytes,<ref name="13.3.3"/> in this case for private-use character set definitions (which might include unregistered sets defined by protocols such as [[ARIB STD B24 character set#Sets and codes|ARIB STD-B24]]<ref>{{harvp|ARIB STD-B24|2008|p=39|loc=part 2, Table 7-3}}</ref> or [[MARC-8]],<ref name="marc-escs"/> or vendor-specific sets such as [[DEC Special Graphics]]).<ref>{{cite web |url=https://www.in-ulm.de/~mascheck/various/alternate_charset/ |title=About the 'alternate linedrawing character set' |last1=Mascheck |first1=Sven |last2=Le Breton |first2=Stefan |last3=Hamilton |first3=Richard L. |website=~sven_mascheck/ |access-date=2020-01-08 |archive-date=2019-12-29 |archive-url=https://web.archive.org/web/20191229100224/https://www.in-ulm.de/~mascheck/various/alternate_charset/ |url-status=live }}</ref> However, in a graphical set designation sequence, if the second {{var|{{serif|I}}}} byte (for a single-byte set) or the third {{var|{{serif|I}}}} byte (for a double-byte set) is 0x20 (space), the set denoted is a "[[Dynamically Redefined Character Set|dynamically redefinable character set]]" (DRCS) defined by prior agreement,<ref name="14.4">{{harvp|ECMA-35|1994|p=36|loc=chapter 14.4}}</ref> which is also considered private use.<ref name="13.3.3" /> A graphical set being considered a DRCS implies that it represents a font of exact glyphs, rather than a set of abstract characters.<ref name="note48">{{harvp|ECMA-35|1994|p=36|loc=chapter 14.4.2, note 48}}</ref> The manner in which DRCS sets and associated fonts are transmitted, allocated and managed is not stipulated by ISO/IEC 2022 / ECMA-35 itself, although it recommends allocating them sequentially starting with {{var|F}} byte 0x40 (<code>@</code>);<ref name="note47">{{harvp|ECMA-35|1994|p=36|loc=chapter 14.4.2, note 47}}</ref> however, a manner for transmitting DRCS fonts is defined within some telecommunication protocols such as [[World System Teletext]].<ref>{{harvp|ETS 300 706|1997|p=103|loc=chapter 14 ("Dynamically Re-definable Characters")}}</ref> There are also three special cases for multi-byte codes. The code sequences <code>ESC $ @</code>, <code>ESC $ A</code>, and <code>ESC $ B</code> were all registered when the contemporary version of the standard allowed multi-byte sets only in G0, so must be accepted in place of the sequences <code>ESC $ ( @</code> through <code>ESC $ ( B</code> to designate to the G0 character set.<ref name="14.3.2">{{harvp|ECMA-35|1994|pp=35-36|loc=chapter 14.3.2}}</ref> There are additional (rarely used) features for switching control character sets, but this is a single-level lookup, in that (as noted above) the C0 set is always invoked over CL, and the C1 set is always invoked over CR or by using escape codes. As noted above, it is required that any C0 character set include the ESC character at position 0x1B, so that further changes are possible. The control set designation sequences (as opposed to the graphical set ones) may also be used from within [[ISO/IEC 10646]] (UCS/Unicode), in contexts where processing [[ANSI escape code]]s is appropriate, provided that each byte in the sequence is padded to the code unit size of the encoding.<ref name="iso10646czdc1d">{{harvp|ISO/IEC 10646|2017|loc=chapter 12.4 ("Identification of control function set")|pp=19-20}}</ref> A table of escape sequence {{var|{{serif|I}}}} bytes and the designation or other function which they perform is below.<ref name="table5">{{harvp|ECMA-35|1994|p=32|loc=table 5}}</ref> {| class="wikitable" |- ! Code !! Hex !! Abbr. !! Name !! Effect !! Example |- | <code>ESC SP {{var|F}}</code> || <code>1B 20 {{var|F}}</code> || ACS || Announce code structure || Specifies code features used, e.g. working sets (see [[#Code structure announcements|below]]).<ref name="15.2">{{harvp|ECMA-35|1994|pp=37-41|loc=chapter 15.2}}</ref> || <code>ESC SP L</code> <br/>([[ISO 4873]] level 1) |- id="CZD" | <code>ESC ! {{var|F}}</code> || <code>1B 21 {{var|F}}</code> || CZD || C0-designate || {{var|F}} selects a C0 control character set to be used.<ref name="14.2.2">{{harvp|ECMA-35|1994|p=34|loc=chapter 14.2.2}}</ref> || <code>ESC ! @</code> <br/>([[C0 and C1 control codes#C0 controls|ASCII C0 codes]]) |- id="C1D" | <code>ESC " {{var|F}}</code> || <code>1B 22 {{var|F}}</code> || C1D || C1-designate || {{var|F}} selects a C1 control character set to be used.<ref name="14.2.3">{{harvp|ECMA-35|1994|p=34|loc=chapter 14.2.3}}</ref> || <code>ESC " C</code> <br/>([[C0 and C1 control codes#C1 controls|ISO 6429 C1 codes]]) |- | <code>ESC # {{var|F}}</code> || <code>1B 23 {{var|F}}</code> || - || ''(Single control function)'' || ''(Reserved for sequences for control functions, [[#Other control functions|see above]].)'' || <code>ESC # 6</code> <br/>(private use: DEC [[Fullwidth|Double Width]] Line)<ref>{{citation |mode=cs1 |url=https://vt100.net/docs/vt510-rm/DECDWL.html |title=DECDWL—Double-Width, Single-Height Line |work=VT510 Video Terminal Programmer Information |author=Digital |author-link=Digital Equipment Corporation |access-date=2020-01-17 |archive-date=2020-08-02 |archive-url=https://web.archive.org/web/20200802022139/https://vt100.net/docs/vt510-rm/DECDWL.html |url-status=live }}</ref> |- id="GZDM4" | {{plainlist|* <code>ESC $ {{var|F}}</code>{{efn|name=legacygzdm4|1=Specified for {{var|F}} bytes 0x40 (<code>@</code>), 0x41 (<code>A</code>) and 0x42 (<code>B</code>) only, for historical reasons.<ref name="14.3.2" /> Some implementations, such as the [[SoftBank]] 2G [[emoji]] encoding, use additional escapes of this form for non-ISO-2022-compliant purposes.<ref>{{cite web |url=https://github.com/kawanet/Encode-JP-Emoji/blob/master/lib/Encode/JP/Emoji/Encoding.pm#L268 |work=Encode-JP-Emoji |title=Encode::JP::Emoji::Encoding |at=Line 268 |first=Yusuke |last=Kawasaki |date=2010 |access-date=2020-05-28 |archive-date=2022-04-30 |archive-url=https://web.archive.org/web/20220430005007/https://github.com/kawanet/Encode-JP-Emoji/blob/master/lib/Encode/JP/Emoji/Encoding.pm#L268 |url-status=live }}</ref>}} * <code>ESC $ ( {{var|F}}</code>}} | {{plainlist|* <code>1B 24 {{var|F}}</code>{{efn|name=legacygzdm4}} * <code>1B 24 28 {{var|F}}</code>}} | GZDM4 || G0-designate multibyte 94-set || {{var|F}} selects a 94<sup>n</sup>-character set to be used for G0.<ref name="14.3.2" /> || <code>ESC $ ( C</code> <br/>([[KS X 1001]] in G0) |- id="G1DM4" | <code>ESC $ ) {{var|F}}</code> || <code>1B 24 29 {{var|F}}</code> || G1DM4 || G1-designate multibyte 94-set || {{var|F}} selects a 94<sup>n</sup>-character set to be used for G1.<ref name="14.3.2" /> || <code>ESC $ ) A</code> <br/>([[GB 2312]] in G1) |- id="G2DM4" | <code>ESC $ * {{var|F}}</code> || <code>1B 24 2A {{var|F}}</code> || G2DM4 || G2-designate multibyte 94-set || {{var|F}} selects a 94<sup>n</sup>-character set to be used for G2.<ref name="14.3.2" /> || <code>ESC $ * B</code> <br/>([[JIS X 0208]] in G2) |- id="G3DM4" | <code>ESC $ + {{var|F}}</code> || <code>1B 24 2B {{var|F}}</code> || G3DM4 || G3-designate multibyte 94-set || {{var|F}} selects a 94<sup>n</sup>-character set to be used for G3.<ref name="14.3.2" /> || <code>ESC $ + D</code> <br/>([[JIS X 0212]] in G3) |- | <code>ESC $ , {{var|F}}</code> || <code>1B 24 2C {{var|F}}</code> || - || ''(not used)'' || ''(not used)''{{efn|Listed by [[MARC-8]].<ref name="marc-escs"/> See footnote for <code>ESC , {{var|F}}</code> below for background.}} || - |- id="G1DM6" | <code>ESC $ - {{var|F}}</code> || <code>1B 24 2D {{var|F}}</code> || G1DM6 || G1-designate multibyte 96-set || {{var|F}} selects a 96<sup>n</sup>-character set to be used for G1.<ref name="14.3.2" /> || <code>ESC $ - 1</code> <br/>(private use) |- id="G2DM6" | <code>ESC $ . {{var|F}}</code> || <code>1B 24 2E {{var|F}}</code> || G2DM6 || G2-designate multibyte 96-set || {{var|F}} selects a 96<sup>n</sup>-character set to be used for G2.<ref name="14.3.2" /> || <code>ESC $ . 2</code> <br/>(private use) |- id="G3DM6" | <code>ESC $ / {{var|F}}</code> || <code>1B 24 2F {{var|F}}</code> || G3DM6 || G3-designate multibyte 96-set || {{var|F}} selects a 96<sup>n</sup>-character set to be used for G3.<ref name="14.3.2" /> || <code>ESC $ / 3</code> <br/>(private use) |- | <code>ESC % {{var|F}}</code> || <code>1B 25 {{var|F}}</code> || DOCS || Designate other coding system || Switches coding system, [[#Interaction with other coding systems|see below]]. || <code>ESC % G</code> <br/>([[UTF-8]]) |- id="IRR" | <code>ESC & {{var|F}}</code> || <code>1B 26 {{var|F}}</code> || IRR || Identify revised registration || Prefixes designation escape to denote revision.{{efn|{{var|F}}, adjusted to the range 1-63, indicates which (upwardly compatible) revision of the immediately-following registration is needed, so that old systems know that they are old.<ref name="14.5">{{harvp|ECMA-35|1994|pp=36-37|loc=chapter 14.5}}</ref>}} || <code>ESC & @ ESC $ B</code> <br/>([[JIS X 0208#Escape sequences for JIS X 0202 / ISO 2022|JIS X 0208:1990]] in G0) |- | <code>ESC ' {{var|F}}</code> || <code>1B 27 {{var|F}}</code> || - || ''(not used)'' || ''(not used)'' || - |- id="GZD4" | <code>ESC ( {{var|F}}</code> || <code>1B 28 {{var|F}}</code> || GZD4 || G0-designate 94-set || {{var|F}} selects a 94-character set to be used for G0.<ref name="14.3.2" /> || <code>ESC ( B</code> <br/>([[ASCII]] in G0) |- id="G1D4" | <code>ESC ) {{var|F}}</code> || <code>1B 29 {{var|F}}</code> || G1D4 || G1-designate 94-set || {{var|F}} selects a 94-character set to be used for G1.<ref name="14.3.2" /> || <code>ESC ) I</code> <br/>([[JIS X 0201]] Kana in G1) |- id="G2D4" | <code>ESC * {{var|F}}</code> || <code>1B 2A {{var|F}}</code> || G2D4 || G2-designate 94-set || {{var|F}} selects a 94-character set to be used for G2.<ref name="14.3.2" /> || <code>ESC * v</code> <br/>([[ITU T.61]] RHS in G2) |- id="G3D4" | <code>ESC + {{var|F}}</code> || <code>1B 2B {{var|F}}</code> || G3D4 || G3-designate 94-set || {{var|F}} selects a 94-character set to be used for G3.<ref name="14.3.2" /> || <code>ESC + D</code> <br/>([[ISO/IEC 646#Associated supplementary character sets|NATS-SEFI-ADD]] in G3) |- | <code>ESC , {{var|F}}</code> || <code>1B 2C {{var|F}}</code> || - || ''(not used)'' || ''(not used)''{{efn|In earlier editions, 96-character sets did not exist, and the escape codes now used for 96-character sets were reserved as space for additional 94-character sets. Accordingly, the <code>ESC 0x1B 0x2C</code> sequence was defined in early editions of the standard as designating further 94-character sets to G0.<ref>{{harvp|ECMA-35|1980|pp=14-15|loc=chapter 5.3.7}}</ref> Since 96-character sets cannot be designated to G0, this first {{var|{{serif|I}}}} byte is not used by the current edition of the standard. However, it is still listed by [[MARC-8]].<ref name="marc-escs">{{cite web |url=https://www.loc.gov/marc/specifications/speccharmarc8.html#technique2 |title=Technique 2: Using standard alternate graphic character sets |work=MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media |date=2007-12-05 |publisher=[[Library of Congress]] |access-date=2020-07-19 |archive-date=2020-07-22 |archive-url=https://web.archive.org/web/20200722125744/https://www.loc.gov/marc/specifications/speccharmarc8.html#technique2 |url-status=live }}</ref>}} || - |- id="G1D6" | <code>ESC - {{var|F}}</code> || <code>1B 2D {{var|F}}</code> || G1D6 || G1-designate 96-set || {{var|F}} selects a 96-character set to be used for G1.<ref name="14.3.2" /> || <code>ESC - A</code> <br/>([[ISO 8859-1]] RHS in G1) |- id="G2D6" | <code>ESC . {{var|F}}</code> || <code>1B 2E {{var|F}}</code> || G2D6 || G2-designate 96-set || {{var|F}} selects a 96-character set to be used for G2.<ref name="14.3.2" /> || <code>ESC . B</code> <br/>([[ISO 8859-2]] RHS in G2) |- id="G3D6" | <code>ESC / {{var|F}}</code> || <code>1B 2F {{var|F}}</code> || G3D6 || G3-designate 96-set || {{var|F}} selects a 96-character set to be used for G3.<ref name="14.3.2" /> || <code>ESC / b</code> <br/>([[ISO 8859-15]] RHS in G3) |} Note that the registry of {{var|F}} bytes is independent for the different types. The 94-character graphic set designated by <code>ESC ( A</code> through <code>ESC + A</code> is not related in any way to the 96-character set designated by <code>ESC - A</code> through <code>ESC / A</code>. And neither of those is related to the 94<sup>n</sup>-character set designated by <code>ESC $ ( A</code> through <code>ESC $ + A</code>, and so on; the final bytes must be interpreted in context. (Indeed, without any intermediate bytes, <code>ESC A</code> is a way of specifying the C1 control code 0x81.) Also note that C0 and C1 control character sets are independent; the C0 control character set designated by <code>ESC ! A</code> (which happens to be the NATS control set for newspaper text transmission) is not the same as the C1 control character set designated by <code>ESC " A</code> (the [[CCITT]] attribute control set for [[Videotex]]). === Interaction with other coding systems === The standard also defines a way to specify coding systems that do not follow its own structure. A sequence is also defined for returning to ISO/IEC 2022; the registrations which support this sequence as encoded in ISO/IEC 2022 comprise (as of 2019) various [[Videotex]] formats, [[UTF-8]], and [[UTF-1]].<ref name="irdocs" /> A second {{var|{{serif|I}}}} byte of 0x2F (<code>/</code>) is included in the designation sequences of codes which do not use that byte sequence to return to ISO 2022; they may have their own means to return to ISO 2022 (such as a different or padded sequence) or none at all.<ref name="15.4" /> All existing registrations of the latter type (as of 2019) are either transparent raw data, [[Unicode Transformation Format|Unicode/UCS formats]], or subsets thereof.<ref name="irdocsslash" /> {| class="wikitable" |- ! Code !! Hex !! Abbr. !! Name !! Effect |- id="DOCS" | <code>ESC % @</code> || <code>1B 25 40</code> ||rowspan=3| DOCS || Designate other coding system ("standard return") || Return to ISO/IEC 2022 from another encoding.<ref name="15.4">{{harvp|ECMA-35|1994|pp=41-42|loc=chapter 15.4}}</ref> |- | <code>ESC % {{var|F}}</code> || <code>1B 25 {{var|F}}</code> || Designate other coding system ("with standard return")<ref name="irdocs">{{harvp|ISO-IR|loc=chapter 2.8.1 ("Coding systems with Standard return")|p=20}}</ref> || {{var|F}} selects an 8-bit code; use <code>ESC % @</code> to return.<ref name="15.4"/> |- | <code>ESC % / {{var|F}}</code> || <code>1B 25 2F {{var|F}}</code> || Designate other coding system ("without standard return")<ref name="irdocsslash">{{harvp|ISO-IR|loc=chapter 2.8.2 ("Coding systems without Standard return")|p=21}}</ref> || {{var|F}} selects an 8-bit code; there is no standard way to return.<ref name="15.4" /> |- id="CMD" | <code>ESC d</code> || <code>1B 64</code> || CMD || Coding method delimiter || Denotes the end of an ISO/IEC 2022 coded sequence.<ref name="15.3">{{harvp|ECMA-35|1994|p=41|loc=chapter 15.3}}</ref> |- |} Of particular interest are the sequences which switch to [[ISO/IEC 10646]] ([[Unicode]]) formats which do not follow the ISO/IEC 2022 structure. These include UTF-8 (which does not reserve the range 0x80–0x9F for control characters), its predecessor UTF-1 (which mixes GR and GL bytes in multi-byte codes), and UTF-16 and UTF-32 (which use wider coding units).<ref name="irdocs" /><ref name="irdocsslash" /> Several codes were also registered for subsets (levels 1 and 2) of UTF-8, UTF-16 and UTF-32, as well as for three levels of [[UTF-16#History|UCS-2]].<ref name="irdocsslash" /> However, the only codes currently specified by ISO/IEC 10646 are the level-3 codes for UTF-8, UTF-16 and UTF-32 and the unspecified-level code for UTF-8, with the rest being listed as deprecated.<ref name="iso10646docs" /> ISO/IEC 10646 stipulates that the [[big-endian]] formats of UTF-16 and UTF-32 are designated by their escape sequences.<ref>{{harvp|ISO/IEC 10646|2017|loc=chapter 12.1 ("Purpose and context of identification")|pp=18–19}}</ref> {|class=wikitable |- !Unicode Format!!Code(s)!!Hex<ref name="iso10646docs">{{harvp|ISO/IEC 10646|2017|loc=chapter 12.2 ("Identification of a UCS encoding scheme")|p=19}}</ref>!!Deprecated codes!!Deprecated hex<ref name="irdocs" /><ref name="irdocsslash" /><ref name="iso10646docs" /> |- |[[UTF-1]]||colspan=2 style="text-align: center;"|(UTF-1 not in current ISO/IEC 10646.)||<code>ESC % B</code>||<code>1B 25 42</code> |- |[[UTF-8]]||<code>ESC % G</code>, <br /><code>ESC % / I</code>||<code>1B 25 47</code>,<ref name="iso-ir-196">{{harvp|ISO-IR-196|1996}}</ref> <br /><code>1B 25 2F 49</code><ref name="iso-ir-192">{{harvp|ISO-IR-192|1996}}</ref>||<code>ESC % / G</code>, <br /><code>ESC % / H</code>||<code>1B 25 2F 47</code>, <br /><code>1B 25 2F 48</code> |- |[[UTF-16]]||<code>ESC % / L</code>||<code>1B 25 2F 4C</code><ref>{{harvp|ISO-IR-195|1996}}</ref>||<code>ESC % / @</code>, <br /><code>ESC % / C</code>, <br /><code>ESC % / E</code>, <br /><code>ESC % / J</code>, <br /><code>ESC % / K</code>||<code>1B 25 2F 40</code>, <br /><code>1B 25 2F 43</code>, <br /><code>1B 25 2F 45</code>, <br /><code>1B 25 2F 4A</code>, <br /><code>1B 25 2F 4B</code> |- |[[UTF-32]]||<code>ESC % / F</code>||<code>1B 25 2F 46</code>||<code>ESC % / A</code>, <br /><code>ESC % / D</code>||<code>1B 25 2F 41</code>, <br /><code>1B 25 2F 44</code> |} Of the sequences switching to UTF-8, <code>ESC % G</code> is the one supported by, for example, [[xterm]].<ref name="xtctrlesc">{{cite web |url=https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-Controls-beginning-with-ESC |title=Controls beginning with ESC |work=XTerm Control Sequences |last1=Moy |first1=Edward |last2=Gildea |first2=Stephen |last3=Dickey |first3=Thomas |access-date=2019-10-04 |archive-date=2019-10-10 |archive-url=https://web.archive.org/web/20191010041907/https://invisible-island.net/xterm/ctlseqs/ctlseqs.html#h3-Controls-beginning-with-ESC |url-status=live }}</ref> Although use of a variant of the standard return sequence from UTF-16 and UTF-32 is permitted, the bytes of the escape sequence must be padded to the size of the code unit of the encoding (i.e. <code>001B 0025 0040</code> for UTF-16), i.e. the coding of the standard return sequence does not conform exactly to ISO/IEC 2022. For this reason, the designations for UTF-16 and UTF-32 use a without-standard-return syntax.<ref name="iso10646stdret">{{harvp|ISO/IEC 10646|2017|loc=chapter 12.5 ("Identification of the coding system of ISO/IEC 2022")|p=20}}</ref> For specifying encodings by labels, the [[X Consortium]]'s [[Compound Text]] format defines five private-use DOCS sequences.<ref name="scheiflerdocs"/> === {{anchor|ACS}}Code structure announcements === The sequence "announce code structure" (<code>ESC SP (0x20) {{var|F}}</code>) is used to ''announce'' a specific code structure, or a specific group of ISO 2022 facilities which are used in a particular code version. Although announcements can be combined, certain contradictory combinations (specifically, using locking shift announcements 16–23 with announcements 1, 3 and 4) are prohibited by the standard, as is using additional announcements on top of [[ISO/IEC 4873]] level announcements 12–14<ref name="15.2"/> (which fully specify the permissible structural features). Announcement sequences are as follows: {| class="wikitable" |- ! Number !! Code !! Hex !! Code version feature announced<ref name="15.2"/> |- | 1 || <code>ESC SP A</code> || <code>1B 20 41</code> || G0 in GL, GR absent or unused, no locking shifts. |- | 2 || <code>ESC SP B</code> || <code>1B 20 42</code> || G0 and G1 invoked to GL by locking shifts, GR absent or unused. |- | 3 || <code>ESC SP C</code> || <code>1B 20 43</code> || G0 in GL, G1 in GR, no locking shifts, requires an 8-bit environment. |- | 4 || <code>ESC SP D</code> || <code>1B 20 44</code> || G0 in GL, G1 in GR if 8-bit, no locking shifts unless in a 7-bit environment. |- | 5 || <code>ESC SP E</code> || <code>1B 20 45</code> || Shift functions preserved during 7-bit/8-bit conversion. |- | 6 || <code>ESC SP F</code> || <code>1B 20 46</code> || C1 controls using escape sequences. |- | 7 || <code>ESC SP G</code> || <code>1B 20 47</code> || C1 controls in CR region in 8-bit environments, as escape sequences otherwise. |- | 8 || <code>ESC SP H</code> || <code>1B 20 48</code> || 94-character graphical sets only. |- | 9 || <code>ESC SP I</code> || <code>1B 20 49</code> || 94-character and/or 96-character graphical sets. |- | 10 || <code>ESC SP J</code> || <code>1B 20 4A</code> || Uses a 7-bit code, even if an eighth bit is available for use. |- | 11 || <code>ESC SP K</code> || <code>1B 20 4B</code> || Requires an 8-bit code. |- | 12 || <code>ESC SP L</code> || <code>1B 20 4C</code> || Complies to [[ISO/IEC 4873]] (ECMA-43) level 1. |- | 13 || <code>ESC SP M</code> || <code>1B 20 4D</code> || Complies to [[ISO/IEC 4873]] (ECMA-43) level 2. |- | 14 || <code>ESC SP N</code> || <code>1B 20 4E</code> || Complies to [[ISO/IEC 4873]] (ECMA-43) level 3. |- <!-- 15 unallocated (reserved) --> | 16 || <code>ESC SP P</code> || <code>1B 20 50</code> || SI / LS0 used. |- <!-- 17 unallocated (reserved), would presumably be LS0R if it existed? --> | 18 || <code>ESC SP R</code> || <code>1B 20 52</code> || SO / LS1 used. |- | 19 || <code>ESC SP S</code> || <code>1B 20 53</code> || LS1R used in 8-bit environments, SO used in 7-bit environments. |- | 20 || <code>ESC SP T</code> || <code>1B 20 54</code> || LS2 used. |- | 21 || <code>ESC SP U</code> || <code>1B 20 55</code> || LS2R used in 8-bit environments, LS2 used in 7-bit environments. |- | 22 || <code>ESC SP V</code> || <code>1B 20 56</code> || LS3 used. |- | 23 || <code>ESC SP W</code> || <code>1B 20 57</code> || LS3R used in 8-bit environments, LS3 used in 7-bit environments. |- <!-- 24, 25 unallocated (reserved), would presumably be SS0, SS1 if they existed? --> | 26 || <code>ESC SP Z</code> || <code>1B 20 5A</code> || SS2 used. |- | 27 || <code>ESC SP [</code> || <code>1B 20 5B</code> || SS3 used. |- | 28 || <code>ESC SP \</code> || <code>1B 20 5C</code> || Single-shifts invoke over GR. |} ==ISO/IEC 2022 code versions== [[File:Moz-cjk.png|thumb|Various ISO 2022 and other [[CJK characters|CJK]] encodings supported by [[Mozilla Firefox]] as of 2004. (This support has been reduced in later versions to avoid certain [[cross site scripting]] attacks.)|alt=(A screenshot of an old version of Firefox showing Big5, GB 2312, GBK, GB 18030, HZ, ISO-2022-CN, Big5-HKSCS, EUC-TW, EUC-JP, ISO-2022-JP, Shift_JIS, EUC-KR, UHC, Johab and ISO-2022-KR as available encodings under the CJK sub-menu.)]] Six 7-bit ISO 2022 code versions (ISO-2022-CN, ISO-2022-CN-EXT, ISO-2022-JP, ISO-2022-JP-1, ISO-2022-JP-2 and ISO-2022-KR) are defined by [[IETF RFC]]s, of which ISO-2022-JP and ISO-2022-KR have been extensively used in the past.<ref name="lunde2022rfcs">{{harvp|Lunde|2008|pp=229-230|loc=Chapter 4 ("Encoding Methods"), section "ISO-2022 encoding"}} "Those encodings that have been extensively used in the past, or continue to be used today for some purposes, have been highlighted."</ref> A number of other variants are defined by vendors, including [[IBM]].<ref name="ibmacri"/> Although UTF-8 is the preferred encoding in [[HTML5]], legacy content in ISO-2022-JP remains sufficiently widespread that the [[WHATWG]] encoding standard retains support for it,<ref name="whatwg-security"/> in contrast to mapping ISO-2022-KR, ISO-2022-CN and ISO-2022-CN-EXT<ref name="whatwg-replacement-labels"/> entirely to the [[replacement character]],<ref name="whatwg-replacement"/> due to concerns about [[code injection]] attacks such as [[cross-site scripting]].<ref name="whatwg-security"/><ref name="whatwg-replacement"/> 8-bit code versions include [[Extended Unix Code]].<ref name="lundeeuc"/><ref name="lundeeucvs"/> The [[ISO/IEC 8859]] encodings also follow ISO 2022, in a subset stipulated in ISO/IEC 4873.<ref name="8859-10-s1"/><ref name="ecma-144-s1"/> ===Japanese e-mail versions=== ====ISO-2022-JP==== '''{{visible anchor|ISO-2022-JP}}''' is a widely used encoding for Japanese, in particular in [[e-mail]]. It was introduced for use on the JUNET network and later codified in [[IETF RFC]] 1468, dated 1993.<ref name="rfc1468">{{harvp|RFC 1468|1993}}</ref> It has an advantage over other [[Japanese language and computers|encodings for Japanese]] in that it does not require [[8-bit clean]] transmission. Microsoft calls it '''Code page 50220'''.<ref name="wdc">{{citation |mode=cs1 |url=https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |title=Code Page Identifiers |publisher=Microsoft |work=Windows Dev Center |access-date=2019-09-16 |archive-date=2019-06-16 |archive-url=https://web.archive.org/web/20190616012252/https://docs.microsoft.com/en-us/windows/desktop/intl/code-page-identifiers |url-status=live }}</ref> It starts in ASCII and includes the following escape sequences: * <code>ESC ( B</code> to switch to ASCII (1 byte per character) * <code>ESC ( J</code> to switch to [[JIS X 0201|JIS X 0201-1976]] (ISO/IEC 646:JP) Roman set (1 byte per character) * <code>ESC $ @</code> to switch to [[JIS X 0208|JIS X 0208-1978]] (2 bytes per character) * <code>ESC $ B</code> to switch to [[JIS X 0208|JIS X 0208-1983]] (2 bytes per character) Use of the two characters added in JIS X 0208-1990 is permitted, but without including the IRR sequence, i.e. using the same escape sequence as JIS X 0208-1983.<ref name="rfc1468" /> Also, due to being registered before designating multi-byte sets except to G0 was possible, the escapes for JIS X 0208 do not include the second {{var|I}}-byte {{code|(}}.<ref name="14.3.2"/> The RFC notes that some existing systems did not distinguish <code>ESC ( B</code> from <code>ESC ( J</code>, or did not distinguish <code>ESC $ @</code> from <code>ESC $ B</code>, but stipulates that the escape sequences should not be changed by systems simply relaying messages such as e-mails.<ref name="rfc1468" /> The [[WHATWG]] Encoding Standard referenced by [[HTML5]] handles <code>ESC ( B</code> and <code>ESC ( J</code> distinctly, but treats <code>ESC $ @</code> the same as <code>ESC $ B</code> when decoding, and uses only <code>ESC $ B</code> for JIS X 0208 when encoding.<ref name="whatwgiso2022jp">{{harvp|WHATWG Encoding Standard|loc=[https://encoding.spec.whatwg.org/#iso-2022-jp section 12.2 ("ISO-2022-JP")]}}</ref> The RFC also notes that some past systems had made erroneous use of the sequence <code>ESC ( H</code> to switch away from JIS X 0208, which is actually registered for [[ISO-IR-11]] (a Swedish variant of [[ISO 646]] and [[World System Teletext]]).<ref name="rfc1468" />{{efn|See also, for instance, {{citation|mode=cs2<!-- Solely because it's in the middle of a sentence, and cs1 wouldn't therefore be punctuationally appropriate. --> |url=http://printronix.com/wp-content/uploads/manuals/PTX_PRM_OKI_N7_256482A.pdf |title=OKI® Programmer's Reference Manual |page=26 |author=Printronix |year=2012}} for a more recent system which uses <code>ESC ( H</code> to switch to ASCII from a DBCS.}} ====Versions with halfwidth katakana==== {{anchor|ISO-2022-JP-EXT}}Use of <code>ESC ( I</code> to switch to the [[JIS X 0201|JIS X 0201-1976]] Kana set (1 byte per character) is not part of the ISO-2022-JP profile,<ref name="rfc1468" /> but is also sometimes used. [[Python (programming language)|Python]] allows it in a variant which it labels '''ISO-2022-JP-EXT''' (which also incorporates JIS X 0212 as described below, completing coverage of [[EUC-JP]]);<ref>{{citation |mode=cs1 |url=https://github.com/python/cpython/blob/d3faf43f9ba7da0ae504c9186b10d0fa3a8eb300/Modules/cjkcodecs/_codecs_iso2022.c#L1122 |title=Modules/cjkcodecs/_codecs_iso2022.c, line 1122 |work=cPython source tree |last=Chang |first=Hye-Shik |publisher=Python Software Foundation |access-date=2019-09-15 |archive-date=2022-04-30 |archive-url=https://web.archive.org/web/20220430005008/https://github.com/python/cpython/blob/d3faf43f9ba7da0ae504c9186b10d0fa3a8eb300/Modules/cjkcodecs/_codecs_iso2022.c#L1122 |url-status=live }}</ref><ref>{{cite web |title=codecs — Codec registry and base classes § Standard Encodings |work=Python 3.7.4 documentation |publisher=Python Software Foundation |url=https://docs.python.org/3.7/library/codecs.html#standard-encodings |access-date=2019-09-16 |archive-date=2019-07-28 |archive-url=https://web.archive.org/web/20190728163453/https://docs.python.org/3.7/library/codecs.html#standard-encodings |url-status=live }}</ref> this is close in both name and structure to an encoding denoted '''ISO-2022-JPext''' by [[Digital Equipment Corporation|DEC]], which furthermore adds a two-byte [[Private Use Areas#Private-use characters in other character sets|user-defined region]] accessed with <code>ESC $ ( 0</code> to complete the coverage of [[Extended Unix Code#DEC Kanji|Super DEC Kanji]].<ref name="decunix">{{cite web |url=http://www.itec.suny.edu/scsys/unix/doc/V4.0F/docs/html/SUPPDOCS/JAPANDOC/JAPANCH2.HTM |title=2: Codesets and Codeset Conversion |work=DIGITAL UNIX Technical Reference for Using Japanese Features |publisher=[[Digital Equipment Corporation]], [[Compaq]]}}{{dead link|date=April 2022}}</ref> The WHATWG/HTML5 variant permits decoding JIS X 0201 katakana in ISO-2022-JP input, but converts the characters to their JIS X 0208 equivalents upon encoding.<ref name="whatwgiso2022jp" /> Microsoft's code page for ISO-2022-JP with JIS X 0201 kana additionally permitted is '''Code page 50221'''.<ref name="wdc" /> Other, older variants known as '''JIS7''' and '''JIS8''' build directly on the 7-bit and 8-bit encodings defined by [[JIS X 0201]] and allow use of JIS X 0201 kana from G1 without escape sequences, using [[Shift Out and Shift In characters|Shift Out and Shift In]] or setting the eighth bit (GR-invoked), respectively.<ref name="lundejisenc">{{harvp|Lunde|2008|pp=236-238|loc=Chapter 4 ("Encoding Methods"), section "The predecessor of ISO-2022-JP encoding—JIS encoding"}}</ref> They are not widely used;<ref name="lundejisenc" /> JIS X 0208 support in extended 8-bit JIS X 0201 is more commonly achieved via [[Shift JIS]]. Microsoft's code page for JIS X 0201-based ISO 2022 with single-byte katakana via Shift Out and Shift In is '''Code page 50222'''.<ref name="wdc" /> ====ISO-2022-JP-2==== '''{{visible anchor|ISO-2022-JP-2}}''' is a multilingual extension of ISO-2022-JP, defined in <nowiki>RFC 1554</nowiki> (dated 1993), which permits the following escape sequences in addition to the ISO-2022-JP ones. The [[ISO/IEC 8859]] parts are 96-character sets which cannot be designated to G0, and are accessed from G2 using the 7-bit escape sequence form of the single-shift code SS2:<ref>{{harvp|RFC 1554|1993}}</ref> * <code>ESC $ A</code> to switch to [[GB 2312|GB 2312-1980]] (2 bytes per character) * <code>ESC $ ( C</code> to switch to [[KSX1001|KS X 1001-1992]] (2 bytes per character) * <code>ESC $ ( D</code> to switch to [[JIS X 0212|JIS X 0212-1990]] (2 bytes per character) * <code>ESC . A</code> to switch to [[ISO/IEC 8859-1]] high part, Extended Latin 1 set (1 byte per character) ''[designated to G2]'' * <code>ESC . F</code> to switch to [[ISO/IEC 8859-7]] high part, Basic Greek set (1 byte per character) ''[designated to G2]'' {{anchor|ISO-2022-JP-1}}ISO-2022-JP with the ISO-2022-JP-2 representation of JIS X 0212, but not the other extensions, was subsequently dubbed '''ISO-2022-JP-1''' by <nowiki>RFC 2237</nowiki>, dated 1997.<ref>{{harvp|RFC 2237|1997}}</ref> ====IBM Japanese TCP==== [[IBM]] implements nine 7-bit ISO 2022 based encodings for Japanese, each using a different set of escape sequences: IBM-956, IBM-957, IBM-958, IBM-959, IBM-5052, IBM-5053, IBM-5054, IBM-5055 and ISO-2022-JP, which are collectively termed "TCP/IP Japanese coded character sets".<ref>{{cite web |url=https://www.ibm.com/support/pages/apar/PQ02042 |title=PQ02042: New Function to Provide C/370 iconv() Support for Japanese ISO-2022-JP |publisher=[[IBM]] |date=2021-01-19 |access-date=2022-01-04 |archive-date=2022-01-04 |archive-url=https://web.archive.org/web/20220104083401/https://www.ibm.com/support/pages/apar/PQ02042 |url-status=live }}</ref> CCSID 9148 is the standard (RFC 1468) ISO-2022-JP.<ref name="ibm-9148"/> {|class="wikitable collapsible" |+IBM variants of ISO-2022-JP |- ! Code page / CCSID !! ACRI definition number !! Escape sequences for ACRI<ref name="ibmacri">{{cite web |archive-url=https://web.archive.org/web/20150107121522/http://www-01.ibm.com/software/globalization/ccsid/acri.html |url=http://www-01.ibm.com/software/globalization/ccsid/acri.html |url-status=dead |archive-date=2015-01-07 |title=Additional Coding-related Required Information |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> |- | 956<ref>{{cite web |archive-url=https://web.archive.org/web/20141202004846/http://www-01.ibm.com/software/globalization/ccsid/ccsid956.html |archive-date=2014-12-02 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid956.html |url-status=dead |title=CCSID 956 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-01 || {{plainlist| * {{code|ESC ( J}} (JIS X 0201 Roman) * {{code|ESC $ ( B}} (JIS X 0208, 1983+, long escape sequence) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} }} |- | 957<ref>{{cite web |archive-url=https://web.archive.org/web/20141130011055/http://www-01.ibm.com/software/globalization/ccsid/ccsid957.html |archive-date=2014-11-30 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid957.html |url-status=dead |title=CCSID 957 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-02 || {{plainlist| * {{code|ESC ( J}} (JIS X 0201 Roman) * {{code|ESC $ ( @}} (JIS X 0208, 1978, long escape sequence) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 958<ref>{{cite web |archive-url=https://web.archive.org/web/20141201220819/http://www-01.ibm.com/software/globalization/ccsid/ccsid958.html |archive-date=2014-12-01 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid958.html |url-status=dead |title=CCSID 958 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-03 || {{plainlist| * {{code|ESC ( A}} (ASCII) * {{code|ESC $ ( B}} (JIS X 0208, 1983+, long escape sequence) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 959<ref>{{cite web |archive-url=https://web.archive.org/web/20141202010322/http://www-01.ibm.com/software/globalization/ccsid/ccsid959.html |archive-date=2014-12-02 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid959.html |url-status=dead |title=CCSID 959 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-04 || {{plainlist| * {{code|ESC ( A}} (ASCII) * {{code|ESC $ ( @}} (JIS X 0208, 1978, long escape sequence) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 5052<ref>{{cite web |archive-url=https://web.archive.org/web/20141129224035/http://www-01.ibm.com/software/globalization/ccsid/ccsid5052.html |archive-date=2014-11-29 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid5052.html |url-status=dead |title=CCSID 5052 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-05 || {{plainlist| * {{code|ESC ( J}} (JIS X 0201 Roman) * {{code|ESC $ B}} (JIS X 0208, 1983+) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 5053<ref>{{cite web |archive-url=https://web.archive.org/web/20141129224035/http://www-01.ibm.com/software/globalization/ccsid/ccsid5052.html |archive-date=2014-11-29 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid5053.html |url-status=dead |title=CCSID 5053 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-06 || {{plainlist| * {{code|ESC ( J}} (JIS X 0201 Roman) * {{code|ESC $ @}} (JIS X 0208, 1978) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 5054<ref>{{cite web |archive-url=https://web.archive.org/web/20141129223621/http://www-01.ibm.com/software/globalization/ccsid/ccsid5054.html |archive-date=2014-11-29 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid5054.html |url-status=dead |title=CCSID 5054 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-07 || {{plainlist| * {{code|ESC ( A}} (ASCII) * {{code|ESC $ B}} (JIS X 0208, 1983+) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 5055<ref>{{cite web |archive-url=https://web.archive.org/web/20141129211812/http://www-01.ibm.com/software/globalization/ccsid/ccsid5055.html |archive-date=2014-11-29 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid5055.html |url-status=dead |title=CCSID 5055 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-08 || {{plainlist| * {{code|ESC ( A}} (ASCII) * {{code|ESC $ @}} (JIS X 0208, 1978) * {{code|ESC $ I}} (JIS X 0201 Katakana) * {{code|ESC $ ( D}} (JIS X 0212) }} |- | 9148<ref name="ibm-9148">{{cite web |archive-url=https://web.archive.org/web/20141129211812/http://www-01.ibm.com/software/globalization/ccsid/ccsid5055.html |archive-date=2014-11-29 |url=http://www-01.ibm.com/software/globalization/ccsid/ccsid9148.html |url-status=dead |title=CCSID 9148 |work=IBM Globalization - Coded Character Set Identifiers |publisher=[[IBM]]}}</ref> || TCP-16 || {{plainlist| * {{code|ESC ( A}} (ASCII) * {{code|ESC ( J}} (JIS X 0201 Roman) * {{code|ESC $ @}} (JIS X 0208, 1978) * {{code|ESC $ B}} (JIS X 0208, 1983+) }} |} ====JIS X 0213==== {{anchor|ISO-2022-JP-3|ISO-2022-JP-2004}}The [[JIS X 0213]] standard, first published in 2000, defines an updated version of ISO-2022-JP, without the ISO-2022-JP-2 extensions, named '''ISO-2022-JP-3'''. The additions made by JIS X 0213 compared to the base JIS X 0208 standard resulted in a new registration being made for the extended JIS plane 1, while the new plane 2 received its own registration. The further additions to plane 1 in the 2004 edition of the standard resulted in an additional registration being added to a further revision of the profile, dubbed '''ISO-2022-JP-2004'''. In addition to the basic ISO-2022-JP designation codes, the following designations are recognized: * <code>ESC ( I</code> to switch to [[JIS X 0201|JIS X 0201-1976]] Kana set (1 byte per character) * <code>ESC $ ( O</code> to switch to [[JIS X 0213|JIS X 0213-2000]] Plane 1 (2 bytes per character) * <code>ESC $ ( P</code> to switch to [[JIS X 0213|JIS X 0213-2000]] Plane 2 (2 bytes per character) * <code>ESC $ ( Q</code> to switch to [[JIS X 0213|JIS X 0213-2004]] Plane 1 (2 bytes per character, ISO-2022-JP-2004 only) ===Other 7-bit versions=== '''{{visible anchor|ISO-2022-KR}}''' is defined in <nowiki>RFC 1557</nowiki>, dated 1993.<ref name="rfc1557">{{harvp|RFC 1557|1993}}</ref> It encodes ASCII and the Korean double-byte [[KSX1001|KS X 1001-1992]],<ref name="ksx">{{cite web |url=http://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf |title=KS X 1001:1992 |access-date=2007-07-12 |archive-date=2007-09-26 |archive-url=https://web.archive.org/web/20070926100143/http://examples.oreilly.com/cjkvinfo/AppL/ksx1001.pdf |url-status=live }}</ref><ref name="ksc">{{harvp|ISO-IR-149|1988}}</ref> previously named KS C 5601-1987. Unlike ISO-2022-JP-2, it makes use of the [[Shift Out and Shift In characters]] to switch between them, after including <code>ESC $ ) C</code> once at the start of a line to designate KS X 1001 to G1.<ref name="rfc1557" /> '''{{visible anchor|ISO-2022-CN}}''' and '''{{visible anchor|ISO-2022-CN-EXT}}''' are defined in <nowiki>RFC 1922</nowiki>, dated 1996. They are 7-bit encodings making use both of the Shift Out and Shift In functions (to shift between G0 and G1), and of the 7-bit escape code forms of the single-shift functions SS2 and SS3 (to access G2 and G3).<ref name="rfc1922">{{harvp|RFC 1922|1996}}</ref> They support the character sets [[GB 2312]] (for [[simplified Chinese]]) and [[CNS 11643]] (for [[traditional Chinese]]). The basic ISO-2022-CN profile uses ASCII as its G0 (shift in) set, and also includes GB 2312 and the first two planes of CNS 11643 (due to these two planes being sufficient to represent all traditional Chinese characters from common [[Big5]], to which the RFC provides a correspondence in an appendix):<ref name="rfc1922" /> * <code>ESC $ ) A</code> to switch to [[GB 2312|GB 2312-1980]] (2 bytes per character) ''[designated to G1]'' * <code>ESC $ ) G</code> to switch to [[CNS 11643|CNS 11643-1992]] Plane 1 (2 bytes per character) ''[designated to G1]'' * <code>ESC $ * H</code> to switch to CNS 11643-1992 Plane 2 (2 bytes per character) ''[designated to G2]'' The ISO-2022-CN-EXT profile permits the following additional sets and planes.<ref name="rfc1922" /> * <code>ESC $ ) E</code> to switch to [[ISO-IR-165]] (2 bytes per character) ''[designated to G1]'' * <code>ESC $ + I</code> to switch to CNS 11643-1992 Plane 3 (2 bytes per character) ''[designated to G3]'' * <code>ESC $ + J</code> to switch to CNS 11643-1992 Plane 4 (2 bytes per character) ''[designated to G3]'' * <code>ESC $ + K</code> to switch to CNS 11643-1992 Plane 5 (2 bytes per character) ''[designated to G3]'' * <code>ESC $ + L</code> to switch to CNS 11643-1992 Plane 6 (2 bytes per character) ''[designated to G3]'' * <code>ESC $ + M</code> to switch to CNS 11643-1992 Plane 7 (2 bytes per character) ''[designated to G3]'' The ISO-2022-CN-EXT profile further lists additional [[Guobiao standard]] graphical sets as being permitted, but conditional on their being assigned registered ISO 2022 escape sequences:<ref name="rfc1922" /> * [[GB 12345]] in G1 * GB 7589 or GB 13131 in G2 * GB 7590 or GB 13132 in G3 The character after the <code>ESC</code> (for single-byte character sets) or <code>ESC $</code> (for multi-byte character sets) specifies the type of character set and working set that is designated to. In the above examples, the character <code>(</code> (0x28) designates a 94-character set to the G0 character set, whereas <code>)</code>, <code>*</code> or <code>+</code> (0x29–0x2B) designates to the G1–G3 character sets. {{anchor|Replacement encoding}}ISO-2022-KR and ISO-2022-CN are used less frequently than ISO-2022-JP, and are sometimes deliberately not supported due to security concerns. Notably, the [[WHATWG]] Encoding Standard used by [[HTML5]] maps ISO-2022-KR, ISO-2022-CN and ISO-2022-CN-EXT (as well as [[HZ-GB-2312]]) to the "replacement" decoder,<ref name="whatwg-replacement-labels">{{harvp|WHATWG Encoding Standard|loc=chapter 4.2 ("Names and labels"), [https://encoding.spec.whatwg.org/#ref-for-replacement%E2%91%A1 anchor "replacement"]}}</ref> which maps all input to the [[replacement character]] (�), in order to prevent certain [[cross-site scripting]] and related attacks, which utilize a difference in encoding support between the client and server.<ref name="whatwg-replacement">{{harvp|WHATWG Encoding Standard|loc=[https://encoding.spec.whatwg.org/#replacement section 14.1 ("replacement")]}}</ref> Although the same security concern (allowing sequences of ASCII bytes to be interpreted differently) also applies to ISO-2022-JP and [[UTF-16]], they could not be given this treatment due to being much more frequently used in deployed content.<ref name="whatwg-security">{{harvp|WHATWG Encoding Standard|loc=[https://encoding.spec.whatwg.org/#security-background section 2 ("Security background")]}}</ref> In April 2024, a security flaw<ref name="gconv-vulnerability">{{cite web |url=https://nvd.nist.gov/vuln/detail/CVE-2024-2961 |title=CVE-2024-2961}}</ref> was found in the implementation of ISO-2022-CN-EXT in [[glibc]], which lead to recommendations to disable the encoding entirely on Linux systems.<ref name="gconv-workaround">{{cite web |title=GLIBC Vulnerability on Servers Serving PHP |url=https://rockylinux.org/news/glibc-vulnerability-april-2024/}}</ref> === ISO/IEC 4873 === [[File:Ecma43 versus EUC.svg|thumb|right|Relationship between ECMA-43 (ISO/IEC 4873) editions and levels, and [[#Extended Unix Code|EUC]].]] A subset of ISO 2022 applied to 8-bit single-byte encodings is defined by '''ISO/IEC 4873''', also published by [[Ecma International]] as ECMA-43. [[ISO/IEC 8859]] defines 8-bit codes for ISO/IEC 4873 (or ECMA-43) level 1.<ref name="8859-10-s1">{{harvp|ISO/IEC FDIS 8859-10|1998|loc=chapter 1 ("Scope")|p=1}}</ref><ref name="ecma-144-s1">{{harvp|ECMA-144|2000|loc=chapter 1 ("Scope")|p=1}}</ref> ISO/IEC 4873 / ECMA-43 defines three levels of encoding:<ref name="ecma-43-8">{{harvp|ECMA-43|1991|loc=chapter 8 ("Levels")|pp=9-10}}</ref> *Level 1, which includes a C0 set, the ASCII G0 set, an optional C1 set and an optional single-byte (94-character or 96-character) G1 set. G0 is invoked over GL, and G1 is invoked over GR. Use of shift functions is not permitted. *Level 2, which includes a (94-character or 96-character) single-byte G2 and/or G3 set in addition to a mandatory G1 set. Only the single-shift functions SS2 and SS3 are permitted (i.e. locking shifts are forbidden), and they invoke over the GL region (including [[Hexadecimal|0x]]20 and 0x7F in the case of a 96-set). SS2 and SS3 must be available in C1 at 0x8E and 0x8F respectively. This minimal required C1 set for ISO 4873 is registered as ISO-IR-105.<ref name="harvp|ISO-IR-105|1985"/> *Level 3, which permits the GR locking-shift functions LS1R, LS2R and LS3R in addition to the single shifts, but otherwise has the same restrictions as level 2. Earlier editions of the standard permitted non-ASCII assignments in the G0 set, provided that the [[ISO/IEC 646]] invariant positions were preserved, that the other positions were assigned to spacing (not combining) characters, that 0x23 was assigned to either [[£]] or [[Number sign|#]], and that 0x24 was assigned to either [[Dollar sign|$]] or [[¤]].<ref>{{harvp|ECMA-43|1985|loc=chapter 7.3 ("The G0 set")|pp=7-11}}</ref> For instance, the 8-bit encoding of [[JIS X 0201]] is compliant with earlier editions. This was subsequently changed to fully specify the ISO/IEC 646:1991 IRV / ISO-IR No. 6 set (ASCII).<ref name="ecma-43-7.4">{{harvp|ECMA-43|1991|loc=chapter 7.4 ("G0 set")|pp=6-8}}</ref><ref name="ecma-43-10.3">{{harvp|ECMA-43|1991|loc=chapter 10.3 ("Identification of a version")|p=11}}</ref><ref name="ecma-43-annexE" /> The use of the [[ISO/IEC 646]] IRV (synchronised with ASCII since 1991) at ISO/IEC 4873 Level 1 with no C1 or G1 set, i.e. using the IRV in an 8-bit environment in which shift codes are not used and the high bit is always zero, is known as '''ISO 4873 DV''', in which DV stands for "Default Version".<ref name="iptc7901">{{citation |mode=cs1 |url=https://www.iptc.org/std/IPTC7901/1.0/specification/7901V5.pdf |title=The IPTC Recommended Message Format |id=IPTC TEC 7901 |edition=5th |date=1995 |author=IPTC |author-link=IPTC |access-date=2020-01-14 |archive-date=2022-01-25 |archive-url=https://web.archive.org/web/20220125080439/https://www.iptc.org/std/IPTC7901/1.0/specification/7901V5.pdf |url-status=live }}</ref> In cases where duplicate characters are available in different sets, the current edition of ISO/IEC 4873 / ECMA-43 only permits using these characters in the lowest numbered working set which they appear in.<ref name="ecma-43-9.2">{{harvp|ECMA-43|1991|loc=chapter 9.2 ("Unique coding of characters")|pp=10}}</ref> For instance, if a character appears in both the G1 set and the G3 set, it must be used from the G1 set. However, use from other sets is noted as having been permitted in earlier editions.<ref name="ecma-43-annexE">{{harvp|ECMA-43|1991|loc=annex E ("Main differences between the second edition (1985) and the present (third) edition of this ECMA Standard")|p=23}}</ref> [[ISO/IEC 8859]] defines complete encodings at level 1 of ISO/IEC 4873, and does not allow for use of multiple ISO/IEC 8859 parts together. It stipulates that [[ISO/IEC 10367]] should be used instead for levels 2 and 3 of ISO/IEC 4873.<ref name="8859-10-s1"/><ref name="ecma-144-s1"/> ISO/IEC 10367:1991 includes G0 and G1 sets matching those used by the first 9 parts of ISO/IEC 8859 (i.e. those which existed as of 1991, when it was published), and some supplementary sets.<ref name="vanWingen">{{cite web |url=https://www.terena.org/activities/multiling/euroml/section08.html |title=8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367 |last=van Wingen |first=Johan W |work=Character sets. Letters, tokens and codes |year=1999 |publisher=Terena |access-date=2019-10-02 |archive-date=2020-08-01 |archive-url=https://web.archive.org/web/20200801214714/https://www.terena.org/activities/multiling/euroml/section08.html |url-status=live }}</ref> Character set designation escape sequences are used for identifying or switching between versions during information interchange only if required by a further protocol, in which case the standard requires an ISO/IEC 2022 announcer sequence specifying the ISO/IEC 4873 level, followed by a complete set of escapes specifying the character set designations for C0, C1, G0, G1, G2 and G3 respectively (but omitting G2 and G3 designations for level 1), with an {{var|F}}-byte of 0x7E denoting an empty set. Each ISO/IEC 4873 level has its own single ISO/IEC 2022 announcer sequence, which are as follows:<ref name="ecma-43-10">{{harvp|ECMA-43|1991|loc=chapter 10 ("Identification of version and level")|pp=10-11}}</ref> {| class="wikitable" |- ! Code !! Hex !! Announcement |- | <code>ESC SP L</code> || <code>1B 20 4C</code> || ISO 4873 Level 1 |- | <code>ESC SP M</code> || <code>1B 20 4D</code> || ISO 4873 Level 2 |- | <code>ESC SP N</code> || <code>1B 20 4E</code> || ISO 4873 Level 3 |- |} ===Extended Unix Code=== {{main|Extended Unix Code}} Extended Unix Code (EUC) is an 8-bit variable-width [[character encoding]] system used primarily for [[Japanese language|Japanese]], [[Korean language|Korean]], and [[simplified Chinese]]. It is based on ISO 2022, and only character sets which conform to the ISO 2022 structure can have EUC forms. Up to four coded character sets can be represented (in G0, G1, G2 and G3). The G0 set is invoked over GL, the G1 set is invoked over GR, and the G2 and G3 sets are (if present) invoked using the single shifts SS2 and SS3, which are used as CR bytes (i.e. 0x8E and 0x8F respectively) and invoke over GR (not GL).<ref name="lundeeuc" /> Locking shift codes are not used.<ref name="lundeeucvs">{{harvp|Lunde|2008|pp=253-255|loc=Chapter 4 ("Encoding Methods"), section "EUC versus ISO-2022 encodings"}}.</ref> The code assigned to the G0 set is ASCII, or the country's national [[ISO 646]] character set such as KS-Roman (KS X 1003) or [[JIS-Roman]] (the lower half of [[JIS X 0201]]).<ref name="lundeeuc" /> Hence, 0x5C ([[backslash]] in US-ASCII) is used to represent a [[Yen sign]] in some versions of EUC-JP and a [[Won sign]] in some versions of EUC-KR. G1 is used for a 94x94 coded character set represented in two bytes. The [[EUC-CN]] form of {{nowrap|GB 2312}} and [[EUC-KR]] are examples of such two-byte EUC codes. [[EUC-JP]] includes characters represented by up to three bytes (i.e. SS3 plus two bytes) whereas a single character in [[EUC-TW]] can take up to four bytes (i.e. SS2 plus three bytes). The EUC code itself does not make use of the announcer or designation sequences from ISO 2022; however, it corresponds to the following sequence of four announcer sequences, with meanings breaking down as follows.<ref name="cdra">{{cite web |url=https://www.ibm.com/downloads/cas/G01BQVRV#page=157 |pages=157–162 |title=Character Data Representation Architecture (CDRA) |author=IBM |website=[[IBM]] |author-link=IBM |access-date=2020-06-18 |archive-date=2019-06-23 |archive-url=https://web.archive.org/web/20190623065058/https://www.ibm.com/downloads/cas/G01BQVRV#page=157 |url-status=live }}</ref> {|class=wikitable |- !Individual sequence!!Hexadecimal!!Feature of EUC denoted |- |<code>ESC SP C</code>||<code>1B 20 43</code>||ISO-8 (8-bit, G0 in GL, G1 in GR) |- |<code>ESC SP Z</code>||<code>1B 20 5A</code>||G2 accessed using SS2 |- |<code>ESC SP [</code>||<code>1B 20 5B</code>||G3 accessed using SS3 |- |<code>ESC SP \</code>||<code>1B 20 5C</code>||Single-shifts invoke over GR |} ===Compound Text (X11)=== The [[X Consortium]] defined an ISO 2022 profile named Compound Text as an interchange format in 1989.<ref>{{harvp|Scheifler|1989}}</ref> This uses only four control codes: {{nowrap|{{Ctrl|HT}} ({{code|0x09}}),}} NL (newline, coded as {{Ctrl|LF}}, {{nowrap|{{code|0x0A}})}}, {{Ctrl|ESC}} {{nowrap|({{code|0x1B}})}} and {{Ctrl|CSI}} (in its 8-bit representation {{code|0x9B}}),<ref>{{harvp|Scheifler|1989|loc=[https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html#Control_Characters § Control Characters]}}</ref> with the SDS {{nowrap|({{code|CSI … ]}})}} CSI sequence being used for bidirectional text control.<ref>{{harvp|Scheifler|1989|loc=[https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html#Directionality § Directionality]}}</ref> It is an 8-bit code using G0 and G1 for GL and GR, and follows [[ISO-8859-1]] in its initial state.<ref>{{harvp|Scheifler|1989|loc=[https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html#Standard_Character_Set_Encodings § Standard Character Set Encodings]}}</ref> The following F-bytes are used: {|class="wikitable collapsible" |+ISO 2022 designation sequences used in X11 Compound Text<ref>{{harvp|Scheifler|1989|loc=[https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html#Approved_Standard_Encodings § Approved Standard Encodings]}}</ref> |- !Escape sequence type!!Final byte!!Graphical set |- |rowspan=3|GZD4, G1D4 (for 94-character sets)||{{code|B}} ({{code|0x42}})||[[ASCII]] |- |{{code|I}} ({{code|0x49}})||[[JIS X 0201]] katakana |- |{{code|J}} ({{code|0x4A}})||[[JISCII|JIS X 0201 Roman]] |- |rowspan=9|G1D6 (for 96-character sets)||{{code|A}} ({{code|0x41}})||[[ISO-8859-1]] high part |- |{{code|B}} ({{code|0x42}})||[[ISO-8859-2]] high part |- |{{code|C}} ({{code|0x43}})||[[ISO-8859-3]] high part |- |{{code|D}} ({{code|0x44}})||[[ISO-8859-4]] high part |- |{{code|F}} ({{code|0x46}})||[[ISO-8859-7]] high part |- |{{code|G}} ({{code|0x47}})||[[ISO-8859-6]] high part |- |{{code|H}} ({{code|0x48}})||[[ISO-8859-8]] high part |- |{{code|L}} ({{code|0x4C}})||[[ISO-8859-5]] high part |- |{{code|M}} ({{code|0x4D}})||[[ISO-8859-9]] high part |- |rowspan=3|GZDM4, G1DM4 (for 2-byte sets)||{{code|A}} ({{code|0x41}})||[[GB 2312]] |- |{{code|B}} ({{code|0x42}})||[[JIS X 0208]] |- |{{code|C}} ({{code|0x43}})||[[KS C 5601]] |} For specifying encodings by labels, X11 Compound Text defines five private-use DOCS sequences: {{nowrap|{{code|ESC % / 0}}}} ({{code|1B 25 2F 30}}) for variable-length encodings, and {{nowrap|{{code|ESC % / 1}}}} through {{nowrap|{{code|ESC % / 4}}}} for fixed-length encodings using one through four bytes respectively. Rather than using another escape sequence to return to {{nowrap|ISO 2022}}, the two bytes following the initial escape sequence specify the remaining length in bytes, coded in base-128 using bytes {{code|0x80–FF}}. The encoding label is included in [[ISO 8859-1]] before the encoded text, and terminated with {{Ctrl|STX}} ({{code|0x02}}).<ref name="scheiflerdocs">{{harvp|Scheifler|1989|loc=[https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html#Non_Standard_Character_Set_Encodings § Non-Standard Character Set Encodings]}}</ref> ==Comparison with other encodings== ===Advantages=== * As ISO/IEC 2022's entire range of graphical character encodings can be invoked over GL, the available glyphs are not significantly limited by an inability to represent GR and C1, such as in a system limited to 7-bit encodings. It accordingly enables the representation of large set of characters in such a system. Generally, this 7-bit compatibility is not really an advantage, except for backwards compatibility with older systems. The vast majority of modern computers use 8 bits for each byte. * As compared to Unicode, ISO/IEC 2022 sidesteps [[Han unification]] by using sequence codes to switch between discrete encodings for different East Asian languages. This avoids the issues{{citation needed|date=January 2013}} associated with unification, such as difficulty supporting multiple [[CJK characters|CJK]] languages with their associated character variants in a single document and font. ===Disadvantages=== * Since ISO/IEC 2022 is a stateful encoding, a program cannot jump in the middle of a block of text to search, insert or delete characters. This makes manipulation of the text very cumbersome and slow when compared to non-stateful encodings. Any jump in the middle of the text may require a backup to the previous escape sequence before the bytes following the escape sequence can be interpreted. * Due to the stateful nature of ISO/IEC 2022, an identical and equivalent character may be encoded in different character sets, which may be designated to any of G0 through G3, which may be invoked using single shifts or by using locking shifts to GL or GR. Consequently, characters can be represented in multiple ways, meaning that two visually identical and equivalent strings can not be reliably compared for equality. * Some systems, like [[DICOM]] and several e-mail clients, use a variant of ISO-2022 (e.g. "ISO 2022 IR 100"<ref>{{Cite web |url=http://dicom.nema.org/medical/dicom/2016d/output/chtml/part02/sect_D.6.2.html |title=DICOM PS3.2 2016d - Conformance; D.6.2 Character Sets; D.6 Support of Character Sets |access-date=2020-05-21 |archive-date=2020-02-16 |archive-url=https://web.archive.org/web/20200216125138/http://dicom.nema.org/medical/dicom/2016d/output/chtml/part02/sect_D.6.2.html |url-status=live }}</ref>) in addition to supporting several other encodings.<ref name="DICOM">{{cite web |url=http://sourceforge.net/mailarchive/message.php?msg_id=23D1952C4CB5164E9B9AA2905736DBEE06A10C5F%40BANMLVEM04.e2k.ad.ge.com |title=DICOM ISO 2022 variation |access-date=2009-07-25 |archive-date=2013-04-30 |archive-url=https://web.archive.org/web/20130430124636/http://sourceforge.net/mailarchive/message.php?msg_id=23D1952C4CB5164E9B9AA2905736DBEE06A10C5F%40BANMLVEM04.e2k.ad.ge.com |url-status=dead }}</ref> This type of variation makes it difficult to portably transfer text between computer systems. * [[UTF-1]], the multi-byte [[Unicode]] transformation format compatible with ISO/IEC 2022's representation of 8-bit control characters, has various disadvantages in comparison with [[UTF-8]], and switching from or to other charsets, as supported by ISO/IEC 2022, is typically unnecessary in Unicode documents. * Because of its escape sequences, it is possible to construct attack byte sequences in which a malicious string (such as [[cross-site scripting]]) is masked until it is decoded to Unicode, which may allow it to bypass sanitisation.<ref name="sivonen2018"/> Use of this encoding is thus treated as suspicious by malware protection suites,<ref>{{Cite web|url=https://bugzilla.mozilla.org/show_bug.cgi?id=935453|title=935453 - Gather telemetry about HZ and other encodings we might try to remove|access-date=2018-06-18|archive-date=2017-05-19|archive-url=https://web.archive.org/web/20170519153033/https://bugzilla.mozilla.org/show_bug.cgi?id=935453|url-status=live}}</ref>{{better source needed|date=September 2015}} and 7-bit ISO 2022 data (except for ISO-2022-JP) is mapped in its entirety to the [[replacement character]] in [[HTML5]] to prevent attacks.<ref name="whatwg-replacement-labels"/><ref name="whatwg-replacement"/> Restricted ISO 2022 8-bit code versions which do not use designation escapes or locking shift codes, such as [[Extended Unix Code]], do not share this problem. * Concatenation can pose issues. Profiles such as ISO-2022-JP specify that the stream starts in the ASCII state and must end in the ASCII state.<ref name="rfc1468" /> This is necessary to ensure that characters in concatenated ISO-2022-JP and/or ASCII streams will be interpreted in the correct set. This has the consequence that if a stream that ends in a multi-byte character is concatenated with one that starts with a multi-byte character, a pair of escape codes are generated switching to ASCII and immediately away from it. However, as stipulated in Unicode Technical Report #36 ("Unicode Security Considerations"), pairs of ISO 2022 escape sequences with no characters between them should generate a [[replacement character]] ("�") to prevent them from being used to mask malicious sequences such as [[cross-site scripting]].<ref>{{cite web|url=https://www.unicode.org/reports/tr36/tr36-15.html#Some_Output_For_All_Input|title=3.6.2 Some Output For All Input|work=Unicode Technical Report #36: Unicode Security Considerations (revision 15)|date=2014-09-19|first1=Mark|last1=Davis|first2=Michel|last2=Suignard|publisher=Unicode Consortium|access-date=2019-02-21|archive-date=2019-02-22|archive-url=https://web.archive.org/web/20190222025553/http://www.unicode.org/reports/tr36/tr36-15.html#Some_Output_For_All_Input|url-status=live}}</ref> Implementing this measure, e.g. in [[Mozilla Thunderbird]], has led to interoperability issues, with unexpected "�" characters being generated where two ISO-2022-JP streams have been concatenated.<ref name="sivonen2018">{{cite web|url=https://hsivonen.fi/unicode-feedback/empty-iso-2022-jp-draft.pdf|first=Henri|last=Sivonen|date=2018-12-17|title=(UNSUBMITTED DRAFT) No U+FFFD Generation for Zero-Length ASCII-State Content between ISO-2022-JP Escape Sequences|access-date=2019-02-21|archive-date=2019-02-21|archive-url=https://web.archive.org/web/20190221224112/https://hsivonen.fi/unicode-feedback/empty-iso-2022-jp-draft.pdf|url-status=live}}</ref> ==See also== * [[ISO 2709]] * [[ISO/IEC 646]] * [[Iso-ir-102|ISO-IR-102]] * [[C0 and C1 control codes]] * [[CJK characters]] * [[MARC standards]] * [[Mojibake]] * [[luit]] * [[ISO/IEC JTC 1/SC 2]] ==Footnotes== {{notelist}} ==References== {{Reflist}} {{sfn whitelist |CITEREFISO-IR-1961996 |CITEREFISO-IR-141975 |CITEREFISO-IR-2081999 |CITEREFISO-IR-1551990 |CITEREFISO-IR-1641992 |CITEREFISO-IR-1041985 |CITEREFISO-IR-11975 |CITEREFISO-IR-1061985 |CITEREFISO-IR-71975 |CITEREFISO-IR-261976 |CITEREFISO-IR-361977 |CITEREFISO-IR-1051985 |CITEREFISO-IR-1921996 |CITEREFISO-IR-1951996 |CITEREFISO-IR-1491988 |CITEREFISO-IR-1401987}} ===Standards and registry indices cited=== * {{cite book |ref={{harvid|ARIB STD-B24|2008}} |title=ARIB STD-B24: Data Coding and Transmission Specification for Digital Broadcasting |version=5.2-E1 |language=English |type=ARIB Standard |volume=1 |date=2008 |author=ARIB |author-link=Association of Radio Industries and Businesses|url=http://www.arib.or.jp/english/html/overview/doc/6-STD-B24v5_2-1p3-E1.pdf |access-date=2017-07-10 |url-status = live|archive-url=https://web.archive.org/web/20170710120018/http://www.arib.or.jp/english/html/overview/doc/6-STD-B24v5_2-1p3-E1.pdf |archive-date=2017-07-10}} * {{cite book |ref={{harvid|ECMA-35|1980}} |title=ECMA-35: Extension of the 7-bit Coded Character Set |edition=2nd |type=ECMA Standard |date=1980 |author=ECMA |author-link=Ecma International |url=https://www.ecma-international.org/wp-content/uploads/ECMA-35_2nd_edition_january_1980.pdf}} * {{cite book |ref={{harvid|ECMA-35|1994}} |title=ECMA-35: Character Code Structure and Extension Techniques |edition=6th |type=ECMA Standard |date=1994 |author=ECMA |author-link=Ecma International |url=https://www.ecma-international.org/wp-content/uploads/ECMA-35_6th_edition_december_1994.pdf}} * {{cite book |ref={{harvid|ECMA-43|1985}} |title=ECMA-43: 8-Bit Coded Character Set Structure and Rules |author=ECMA |author-link=Ecma International |date=1985 |edition=2nd |type=ECMA Standard |url=https://www.ecma-international.org/wp-content/uploads/ECMA-43_2nd_edition_december_1985.pdf}} * {{cite book |ref={{harvid|ECMA-43|1991}} |title=ECMA-43: 8-Bit Coded Character Set Structure and Rules |author=ECMA |author-link=Ecma International |date=1991 |edition=3rd |type=ECMA Standard |url=https://www.ecma-international.org/wp-content/uploads/ECMA-43_3rd_edition_december_1991.pdf}} * {{cite book |ref={{harvid|ECMA-48|1991}} |title=ECMA-48: Control Functions for Coded Character Sets |author=ECMA |author-link=Ecma International |date=1991 |edition=5th |type=ECMA Standard |url=https://www.ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf}} * {{cite book |ref={{harvid|ECMA-144|2000}} |title=ECMA-144: 8-Bit Single-Byte Coded Graphic Character sets: Latin Alphabet No. 6 |author=ECMA |author-link=Ecma International |date=2000 |edition=3rd |type=ECMA Standard |url=https://www.ecma-international.org/wp-content/uploads/ECMA-144_3rd_edition_december_2000.pdf}} * {{cite book |ref={{harvid|ETS 300 706|1997}} |title=ETS 300 706: Enhanced Teletext specification |author=European Broadcasting Union |author-link=European Broadcasting Union |publisher=[[ETSI]] |date=1997 |type=European Telecommunications Standards |url=https://www.etsi.org/deliver/etsi_i_ets/300700_300799/300706/01_60/ets_300706e01p.pdf}} * {{cite book |ref={{harvid|ISO/IEC 2375|2003}} |url=https://www.iso.org/standard/32184.html |title=ISO/IEC 2375:2003: Information technology — Procedure for registration of escape sequences and coded character sets |institution=[[ISO]] |author=ISO/IEC JTC 1/SC 2 |author-link=ISO/IEC JTC 1/SC 2 |date=2003}} * {{cite book |ref={{harvid|ISO/IEC FDIS 8859-10|1998}} |title=ISO/IEC FDIS 8859-10: Information Technology — 8-bit single-byte coded graphic character sets — Part 10: Latin alphabet No. 6 |date=1998-02-12 |author=ISO/IEC JTC 1/SC 2 |author-link=ISO/IEC JTC 1/SC 2 |url=http://www.open-std.org/JTC1/SC2/WG3/docs/n415.pdf |type=Final Draft International Standard}} * {{cite book |ref={{harvid|ISO/IEC 10646|2017}} |title=ISO/IEC 10646: Information technology — Universal Coded Character Set (UCS) |author=ISO/IEC JTC 1/SC 2 |author-link=ISO/IEC JTC 1/SC 2 |publisher=[[ISO]] |edition=5th |year=2017 |url=https://standards.iso.org/ittf/PubliclyAvailableStandards/c069119_ISO_IEC_10646_2017.zip |type=ISO Standard}} * {{cite book |ref={{harvid|ISO-IR}} |title=ISO-IR: ISO/IEC International Register of Coded Character Sets To Be Used With Escape Sequences |publisher=ITSCJ/[[Information Processing Society of Japan|IPSJ]] |url=https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf |type=Registry Index}} * {{citation|mode=cs1 |url=https://www.x.org/releases/current/doc/xorg-docs/ctext/ctext.html |last=Scheifler |first=Robert W. |title=Compound Text Encoding |type=X Consortium Standard |publisher=[[X Consortium]] |year=1989}} * {{citation|mode=cs1 |ref={{harvid|WHATWG Encoding Standard}} |url=https://encoding.spec.whatwg.org/ |last=van Kesteren |first=Anne |author-link=Anne van Kesteren |title=WHATWG Encoding Standard |publisher=[[WHATWG]] |type=WHATWG Living Standard}} ===Registered code sets cited=== * {{cite iso-ir |number=1 |date=1975-12-01 |sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |title=The set of control characters of the ISO 646 |id-in-title=1}} * {{cite iso-ir |number=7 |sponsor=Sveriges Standardiseringskommission |title=NATS Control set for newspaper text transmission |date=1975-12-01 |id-in-title=1}} * {{cite iso-ir |number=14 |title=The Japanese Roman graphic set of characters |date=1975-12-01 |sponsor=Japanese Industrial Standards Committee |sponsor-link=Japanese Industrial Standards Committee |id-in-title=1}} * {{cite iso-ir |number=26 |sponsor=IPTC |sponsor-link=International Press Telecommunications Council |title=Control set for newspaper text transmission |date=1976-03-25 |id-in-title=1}} * {{cite iso-ir |number=36 |sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History |title=The set of control characters of ISO 646, with IS4 replaced by Single Shift for G2 (SS2) |date=1977-10-15 |id-in-title=1}} * {{cite iso-ir |number=104 |sponsor1=ISO/TC97/SC2/WG-7 |sponsor-link1=ISO/IEC JTC 1/SC 2#History |sponsor2=ECMA |sponsor-link2=Ecma International |title=Minimum C0 set for ISO 4873 |date=1985-08-01 |id-in-title=1}} * {{cite iso-ir |number=105 |title=Minimum C1 Set for ISO 4873 |date=1985-08-01 |sponsor1=ISO/TC97/SC2/WG-7 |sponsor-link1=ISO/IEC JTC 1/SC 2#History |sponsor2=ECMA |sponsor-link2=Ecma International |id-in-title=1}} * {{cite iso-ir |number=106 |sponsor=ITU |sponsor-link=International Telecommunication Union |title=Teletex Primary Set of Control Functions |date=1985-08-01 |id-in-title=1}} * {{cite iso-ir |number=140 |sponsor=Úřad pro normalizaci a měřeni |title=The C0 Set of Control Characters of ISO 646, with EM replaced by SS2 |date=1987-07-31 |id-in-title=1}} * {{cite iso-ir |number=149 |title=Korean Graphic Character Set for Information Interchange (KS C 5601:1987) |date=1988-10-01 |sponsor=Korea Bureau of Standards |id-in-title=1}} * {{cite iso-ir |number=155 |title=Basic Box-Drawings Set |date=1990-04-16 |sponsor=ISO/IEC/JTC1/SC2/WG3 |sponsor-link=ISO/IEC JTC 1/SC 2 |id-in-title=1}} * {{cite iso-ir |number=164 |title=Hebrew Supplementary Set of Graphic Characters |date=1992-07-13 |sponsor=CCITT |sponsor-link=CCITT |id-in-title=1}} * {{cite iso-ir |number=192 |title=UCS Transformation Format (UTF-8), implementation level 3, without standard return |sponsor=ECMA |sponsor-link=Ecma International |date=1996-04-22 |id-in-title=1}} * {{cite iso-ir |number=195 |title=UCS Transformation Format (UTF-16), implementation level 3, without standard return |sponsor=ECMA |sponsor-link=Ecma International |date=1996-04-22 |id-in-title=1}} * {{cite iso-ir |number=196 |title=UCS Transformation Format (UTF-8), with standard return |sponsor=ECMA |sponsor-link=Ecma International |date=1996-04-22 |id-in-title=1}} * {{cite iso-ir |number=208 |title=Ogham coded character set for information interchange. |sponsor=National Standards Authority of Ireland |sponsor-link=National Standards Authority of Ireland |date=1999-12-07 |id-in-title=1}} === Internet Requests For Comment cited === * {{citation|mode=cs1 |ref={{harvid|RFC 1468|1993}} |title=RFC 1468: Japanese Character Encoding for Internet Messages |url=https://tools.ietf.org/html/rfc1468 |first1=J. |last1=Murai |first2=M. |last2=Crispin |first3=E. |last3=van der Poel |publisher=[[IETF]] |work=Requests for Comments |date=1993 |doi=10.17487/rfc1468|doi-access=free |url-access=subscription }} * {{citation|mode=cs1 |ref={{harvid|RFC 1554|1993}} |title=RFC 1554: ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP |url=https://tools.ietf.org/html/rfc1554 |first1=M. |last1=Ohta |first2=K. |last2=Handa |publisher=[[IETF]] |work=Requests for Comments |date=1993 |doi=10.17487/rfc1554|doi-access=free }} * {{citation|mode=cs1 |ref={{harvid|RFC 1557|1993}} |title=RFC 1557: Korean Character Encoding for Internet Messages |url=https://tools.ietf.org/html/rfc1557 |first1=U. |last1=Choi |first2=K. |last2=Chon |first3=H. |last3=Park |publisher=[[IETF]] |work=Requests for Comments |date=1993 |doi=10.17487/rfc1557|doi-access=free |url-access=subscription }} * {{citation|mode=cs1 |ref={{harvid|RFC 1922|1996}} |title=RFC 1922: Chinese Character Encoding for Internet Messages |url=https://tools.ietf.org/html/rfc1922 |first1=HF. |last1=Zhu |first2=DY. |last2=Hu |first3=ZG. |last3=Wang |first4=TC. |last4=Kao |first5=WCH. |last5=Chang |first6=M. |last6=Crispin |publisher=[[IETF]] |work=Requests for Comments |date=1996 |doi=10.17487/rfc1922|doi-access=free |url-access=subscription }} * {{citation|mode=cs1 |ref={{harvid|RFC 2237|1997}} |title=RFC 2237: Japanese Character Encoding for Internet Messages |url=https://tools.ietf.org/html/rfc2237 |first1=K. |last1=Tamaru |publisher=[[IETF]] |work=Requests for Comments |date=1997 |doi=10.17487/rfc2237|doi-access=free }} === Other published works cited === * {{cite book |last=Lunde |first=Ken |author-link=Ken Lunde |title=CJKV Information Processing |edition=2nd |date=2008 |publisher=[[O'Reilly Media]] |isbn=9780596514471}} == Further reading == * {{cite book |last=Lunde |first=Ken |author-link=Ken Lunde |title=CJKV Information Processing |location=Cambridge, Massachusetts |publisher=[[O'Reilly Media|O'Reilly & Associates]] |date=1998 |isbn=1-56592-224-7 |url-access=registration |url=https://archive.org/details/cjkvinformationp00lund }} == External links == * [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=22747 ISO/IEC 2022:1994] * [http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=31104 ISO/IEC 2022:1994/Cor 1:1999] * [https://ecma-international.org/publications-and-standards/standards/ecma-35/ ECMA-35], equivalent to ISO/IEC 2022 and freely downloadable. * [https://web.archive.org/web/20240218150849/https://itscj.ipsj.or.jp/english/vbcqpr00000004qn-att/ISO-IR.pdf International Register of Coded Character Sets to be Used with Escape Sequences], a full list of assigned character sets and their escape sequences * [https://web.archive.org/web/20001216022100/http://tronweb.super-nova.co.jp/characcodehist.html History of Character Codes in North America, Europe, and East Asia from 1999, rev. 2004] * [[Ken Lunde]]'s [http://users.monash.edu/~jwb/cjk.inf CJK.INF]: a document on encoding Chinese, Japanese, and Korean (CJK) languages, including a discussion of the various variants of ISO/IEC 2022.<!-- slightly older version (1.9 not 2.1) presumably uploaded by Lunde himself: https://blogs.adobe.com/CCJKType/files/2013/09/cjk_inf.txt --> {{Character encoding|state=uncollapsed}} {{Ecma International Standards}} {{ISO standards}} {{List of International Electrotechnical Commission standards}} {{DEFAULTSORT:ISO IEC 2022}} [[Category:Character sets]] [[Category:Ecma standards]] [[Category:ISO/IEC standards|#02022]] [[Category:Encodings of Asian languages]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Better source needed
(
edit
)
Template:Character encoding
(
edit
)
Template:Citation
(
edit
)
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite iso-ir
(
edit
)
Template:Cite web
(
edit
)
Template:Code
(
edit
)
Template:Ctrl
(
edit
)
Template:Dead link
(
edit
)
Template:Distinguish
(
edit
)
Template:Ecma International Standards
(
edit
)
Template:Efn
(
edit
)
Template:Further
(
edit
)
Template:Harvp
(
edit
)
Template:ISO standards
(
edit
)
Template:Infobox character encoding
(
edit
)
Template:List of International Electrotechnical Commission standards
(
edit
)
Template:Main
(
edit
)
Template:Notelist
(
edit
)
Template:Nowrap
(
edit
)
Template:Plainlist
(
edit
)
Template:Reflist
(
edit
)
Template:Sfn whitelist
(
edit
)
Template:Short description
(
edit
)
Template:Use Oxford spelling
(
edit
)
Template:Var
(
edit
)
Template:Visible anchor
(
edit
)