Editing ASCII (section)

==<span class="anchor" id="Variants"></span>Variants and derivations==
As computer technology spread throughout the world, different [[Standardization|standards bodies]] and corporations developed many variations of ASCII to facilitate the expression of non-English languages that used Roman-based alphabets. One could class some of these variations as "[[ASCII extension]]s", although some misuse that term to represent all variants, including those that do not preserve ASCII's character-map in the 7-bit range. Furthermore, the ASCII extensions have also been mislabelled as ASCII.

===<span class="anchor" id="7-bit"></span>7-bit codes===
{{Main|ISO/IEC 646|ITU T.50}}{{See also|UTF-7}}
From early in its development,<ref>"Specific Criteria", attachment to memo from R. W. Reach, "X3-2 Meeting – September 14 and 15", September 18, 1961</ref> ASCII was intended to be just one of several national variants of an international character code standard.

<!-- ITU-T ITU T.50
International Reference Alphabet (IRA)
International Alphabet No. 5 (IA5) -->
Other international standards bodies have ratified character encodings such as [[ISO 646]] (1967) that are identical or nearly identical to ASCII, with extensions for characters outside the English [[alphabet]] and symbols used outside the United States, such as the symbol for the United Kingdom's [[pound sterling]] (£); e.g. with [[code page 1104]]. Almost every country needed an adapted version of ASCII, since ASCII suited the needs of only the US and a few other countries. For example, Canada had its own version that supported French characters.

Many other countries developed variants of ASCII to include non-English letters (e.g. [[é]], [[ñ]], [[ß]], [[Ł]]), currency symbols (e.g. [[£]], [[¥]]), etc. See also [[YUSCII]] (Yugoslavia).

It would share most characters in common, but assign other locally useful characters to several [[code point]]s reserved for "national use". However, the four years that elapsed between the publication of ASCII-1963 and ISO's first acceptance of an international recommendation during 1967<ref name="Maréchal_1967">{{citation |author-last=Maréchal |author-first=R. |title=ISO/TC 97 – Computers and Information Processing: Acceptance of Draft ISO Recommendation No. 1052 |date=1967-12-22}}</ref> caused ASCII's choices for the national use characters to seem to be ''de facto'' standards for the world, causing confusion and incompatibility once other countries did begin to make their own assignments to these code points.

ISO/IEC 646, like ASCII, is a 7-bit character set. It does not make any additional codes available, so the same code points encoded different characters in different countries. Escape codes were defined to indicate which national variant applied to a piece of text, but they were rarely used, so it was often impossible to know what variant to work with and, therefore, which character a code represented, and in general, text-processing systems could cope with only one variant anyway.

Because the bracket and brace characters of ASCII were assigned to "national use" code points that were used for accented letters in other national variants of ISO/IEC 646, a German, French, or Swedish, etc. programmer using their national variant of ISO/IEC 646, rather than ASCII, had to write, and thus read, something such as

<code>ä aÄiÜ = 'Ön'; ü</code>

instead of

<code>{ a[i] = '\n'; }</code>

[[C trigraph]]s were created to solve this problem for [[ANSI C]], although their late introduction and inconsistent implementation in compilers limited their use. Many programmers kept their computers on ASCII, so plain-text in Swedish, German etc. (for example, in e-mail or [[Usenet]]) contained "{, }" and similar variants in the middle of words, something those programmers got used to. For example, a Swedish programmer mailing another programmer asking if they should go for lunch, could get "N{ jag har sm|rg}sar" as the answer, which should be "Nä jag har smörgåsar" meaning "No I've got sandwiches".

In Japan and Korea, still {{As of|2021|alt=as of the 2020s|post=,|df=US}} a variation of ASCII is used, in which the [[backslash]] (5C hex) is rendered as ¥ (a [[Yen sign]], in Japan) or ₩ (a [[Won sign]], in Korea). This means that, for example, the file path C:\Users\Smith is shown as C:¥Users¥Smith (in Japan) or C:₩Users₩Smith (in Korea).

In Europe, [[teletext character set]]s, which are variants of ASCII, are used for broadcast TV subtitles, defined by [[World System Teletext]] and broadcast using the DVB-TXT standard for embedding teletext into DVB transmissions.<ref>{{Cite web |url=https://dvb.org/?standard=specification-for-conveying-itu-r-system-b-teletext-in-dvb-bitstreams |title=DVB-TXT (Teletext) Specification for conveying ITU-R System B Teletext in DVB bitstreams}}</ref> In the case that the subtitles were initially authored for teletext and converted, the derived subtitle formats are constrained to the same character sets.

===<span class="anchor" id="8-bit"></span>8-bit codes===
{{Main|Extended ASCII}}{{See also|ISO/IEC 8859|UTF-8}}
<!-- to be mentioned [[USASCII-8]] -->
Eventually, as 8-, [[16-bit computing|16-]], and [[32-bit computing|32-bit]] (and later [[64-bit computing|64-bit]]) computers began to replace [[12-bit computing|12-]], [[18-bit computing|18-]], and [[36-bit computing|36-bit]] computers as the norm, it became common to use an 8-bit byte to store each character in memory, providing an opportunity for extended, 8-bit relatives of ASCII. In most cases these developed as true extensions of ASCII, leaving the original character-mapping intact, but adding additional character definitions after the first 128 (i.e., 7-bit) characters. ASCII itself remained a seven-bit code: the term "extended ASCII" has no official status.

For some countries, 8-bit extensions of ASCII were developed that included support for characters used in local languages; for example, [[ISCII]] for India and [[VISCII]] for Vietnam. [[Kaypro]] [[CP/M]] computers used the "upper" 128 characters for the Greek alphabet.{{citation needed|date=November 2023}}

Even for markets where it was not necessary to add many characters to support additional languages, manufacturers of early home computer systems often developed their own 8-bit extensions of ASCII to include additional characters, such as [[box-drawing characters]], [[semigraphics]], and [[Sprite (computer graphics)|video game sprites]]. Often, these additions also replaced control characters (index 0 to 31, as well as index 127) with even more platform-specific extensions. In other cases, the extra bit was used for some other purpose, such as toggling [[inverse video]]; this approach was used by [[ATASCII]], an extension of ASCII developed by [[Atari]].

Most ASCII extensions are based on ASCII-1967 (the current standard), but some extensions are instead based on the earlier ASCII-1963. For example, [[PETSCII]], which was developed by [[Commodore International]] for their [[8-bit computing|8-bit]] systems, is based on ASCII-1963. Likewise, many [[Sharp MZ character set]]s are based on ASCII-1963.

IBM defined [[code page 437]] for the [[IBM PC]], replacing the control characters with graphic symbols such as [[Emoticon|smiley faces]], and mapping additional graphic characters to the upper 128 positions.<ref>{{cite book |url=http://www.bitsavers.org/pdf/ibm/pc/pc/6025008_PC_Technical_Reference_Aug81.pdf |title=Technical Reference |at=Appendix C. Of Characters Keystrokes and Color |edition=First |date=August 1981 |series=Personal Computer Hardware Reference Library |publisher=IBM}}</ref> [[Digital Equipment Corporation]] developed the [[Multinational Character Set]] (DEC-MCS) for use in the popular [[VT220]] [[computer terminal|terminal]] as one of the first extensions designed more for international languages than for block graphics. [[Apple Inc.|Apple]] defined [[Mac OS Roman]] for the Macintosh and [[Adobe Inc.|Adobe]] defined the [[PostScript Standard Encoding]] for [[PostScript]]; both sets contained "international" letters, typographic symbols and punctuation marks instead of graphics, more like modern character sets.

The [[ISO/IEC 8859]] standard (derived from the DEC-MCS) provided a standard that most systems copied (or at least were based on, when not copied exactly). A popular further extension designed by Microsoft, [[Windows-1252]] (often mislabeled as [[ISO-8859-1]]), added the typographic punctuation marks needed for traditional text printing. ISO-8859-1, Windows-1252, and the original 7-bit ASCII were the most common character encoding methods on the [[World Wide Web]] until 2008, when [[UTF-8]] overtook them.<ref name="UTF-8_2008"/>

[[ISO/IEC 4873]] introduced 32 additional control codes defined in the 80–9F [[hexadecimal]] range, as part of extending the 7-bit ASCII encoding to become an 8-bit system.<ref name="Unicode-5.0_2006">{{cite book |author=The Unicode Consortium |editor-first=Julie D. |editor-last=Allen |title=The Unicode standard, Version 5.0 |date=2006-10-27 |publisher=[[Addison-Wesley Professional]] |location=Upper Saddle River, New Jersey, US |isbn=978-0-321-48091-0 |chapter-url=http://unicode.org/book/ch13.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://unicode.org/book/ch13.pdf |archive-date=2022-10-09 |url-status=live |access-date=2015-03-13 |chapter=Chapter 13: Special Areas and Format Characters |page=314}}</ref>

===Unicode===
{{Main|Unicode|ISO/IEC 10646}}{{See also|Basic Latin (Unicode block)}}
[[Unicode]] and the ISO/IEC 10646 [[Universal Character Set]] (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using [[natural number]]s called ''code points'') and encoding (to 8-, 16-, or 32-bit binary formats, called [[UTF-8]], [[UTF-16]], and [[UTF-32]], respectively).

ASCII was incorporated into the Unicode (1991) character set as the first 128 symbols, so the 7-bit ASCII characters have the same numeric codes in both sets. This allows [[UTF-8]] to be [[Backward compatibility|backward compatible]] with 7-bit ASCII, as a UTF-8 file containing only ASCII characters is identical to an ASCII file containing the same sequence of characters.  Even more importantly, [[forward compatibility]] is ensured as software that recognizes only 7-bit ASCII characters as special and does not alter bytes with the highest bit set (as is often done to support 8-bit ASCII extensions such as ISO-8859-1) will preserve UTF-8 data unchanged.<ref>{{cite web |title=utf-8(7)&nbsp;– Linux manual page |publisher=Man7.org |date=2014-02-26 |url=http://man7.org/linux/man-pages/man7/utf-8.7.html |access-date=2014-04-21 |archive-url=https://web.archive.org/web/20140422232059/http://man7.org/linux/man-pages/man7/utf-8.7.html |archive-date=April 22, 2014 |url-status=live }}</ref>