Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
ASCII
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Character groups== ===<span class="anchor" id="ASCII control characters"></span>Control characters=== [[File:US ASCII Control Character Symbols.png|thumb|right|Early symbols assigned to the 32 control characters, space and delete characters. ([[ISO 2047]], MIL-STD-188-100, 1972)]] {{Main|C0 control codes}} ASCII reserves the first 32 [[code point]]s (numbers 0β31 decimal) and the last one (number 127 decimal) for [[control character]]s. These are codes intended to control [[peripheral device]]s (such as [[computer printer|printers]]), or to provide [[Metadata|meta-information]] about data streams, such as those stored on magnetic tape. Despite their name, these code points do not represent printable characters (i.e. they are not characters at all, but signals). For debugging purposes, "placeholder" symbols (such as those given in [[ISO 2047]] and its predecessors) are assigned to them. For example, character 0x0A represents the "line feed" function (which causes a printer to advance its paper), and character 8 represents "[[backspace]]". {{IETF RFC|2822}} refers to control characters that do not include carriage return, line feed or [[Whitespace (computer science)|white space]] as non-whitespace control characters.<ref name="RFC-2822">{{cite IETF |title=Internet Message Format |editor-first1=Peter W. |editor-last1=Resnick |date=April 2001 |rfc=2822 |access-date=2016-06-13}} (NB. NO-WS-CTL.)</ref> Except for the control characters that prescribe elementary line-oriented formatting, ASCII does not define any mechanism for describing the structure or appearance of text within a document. Other schemes, such as [[markup language]]s, address page and document layout and formatting. The original ASCII standard used only short descriptive phrases for each control character. The ambiguity this caused was sometimes intentional, for example where a character would be used slightly differently on a terminal link than on a [[data stream]], and sometimes accidental, for example the standard is unclear about the meaning of "delete". Probably the most influential single device affecting the interpretation of these characters was the [[Teletype Model 33]] ASR, which was a printing terminal with an available [[punched tape|paper tape]] reader/punch option. Paper tape was a very popular medium for long-term program storage until the 1980s, less costly and in some ways less fragile than magnetic tape. In particular, the Teletype Model 33 machine assignments for codes 17 (control-Q, DC1, also known as XON), 19 (control-S, DC3, also known as XOFF), and 127 ([[Delete key|delete]]) became ''de facto'' standards. The Model 33 was also notable for taking the description of control-G (code 7, BEL, meaning audibly alert the operator) literally, as the unit contained an actual bell which it rang when it received a BEL character. Because the keytop for the O key also showed a left-arrow symbol (from ASCII-1963, which had this character instead of [[underscore]]), a noncompliant use of code 15 (control-O, shift in) interpreted as "delete previous character" was also adopted by many early timesharing systems but eventually became neglected. When a Teletype 33 ASR equipped with the automatic paper tape reader received a control-S (XOFF, an abbreviation for transmit off), it caused the tape reader to stop; receiving control-Q (XON, transmit on) caused the tape reader to resume. This so-called [[Flow control (data)|flow control]] technique became adopted by several early computer operating systems as a "handshaking" signal warning a sender to stop transmission because of impending [[buffer overflow]]; it persists to this day in many systems as a manual output control technique. On some systems, control-S retains its meaning, but control-Q is replaced by a second control-S to resume output. The 33 ASR also could be configured to employ control-R (DC2) and control-T (DC4) to start and stop the tape punch; on some units equipped with this function, the corresponding control character lettering on the keycap above the letter was TAPE and <s>TAPE</s> respectively.<ref name="McConnell">{{cite web |title=Understanding ASCII Codes |author-last1=McConnell |author-first1=Robert |author-last2=Haynes |author-first2=James |author-last3=Warren |author-first3=Richard |url=http://www.nadcomm.com/ascii_code.htm |access-date=2014-05-11 |archive-url=https://web.archive.org/web/20140227190425/http://www.nadcomm.com/ascii_code.htm |archive-date=February 27, 2014 |url-status=dead}}</ref> ====Delete vs backspace==== The Teletype could not move its typehead backwards, so it did not have a key on its keyboard to send a BS (backspace). Instead, there was a key marked {{keypress|RUB OUT}} that sent code 127 (DEL). The purpose of this key was to erase mistakes in a manually-input paper tape: the operator had to push a button on the tape punch to back it up, then type the rubout, which punched all holes and replaced the mistake with a character that was intended to be ignored.<ref>{{cite mailing list |url=http://lists.gnu.org/archive/html/help-gnu-emacs/2014-05/msg00448.html |title=Re: editor and word processor history (was: Re: RTF for emacs) |author=Barry Margolin |mailing-list=help-gnu-emacs |date=May 29, 2014 |access-date=July 11, 2014 |archive-url=https://web.archive.org/web/20140714133149/http://lists.gnu.org/archive/html/help-gnu-emacs/2014-05/msg00448.html |archive-date=July 14, 2014 |url-status=live }}</ref> Teletypes were commonly used with the less-expensive computers from [[Digital Equipment Corporation]] (DEC); these systems had to use what keys were available, and thus the DEL character was assigned to erase the previous character.<ref name="pdp-6-monitor-manual">{{cite web |url=http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |title=PDP-6 Multiprogramming System Manual |page=43 |publisher=[[Digital Equipment Corporation]] (DEC) |date=1965 |access-date=July 10, 2014 |archive-url=https://web.archive.org/web/20140714140253/http://bitsavers.trailing-edge.com/pdf/dec/pdp6/DEC-6-0-EX-SYS-UM-IP-PRE00_Multiprogramming_System_Manual_1965.pdf |archive-date=July 14, 2014 |url-status=live }}</ref><ref name="pdp-10-monitor-manual">{{cite web |url=http://bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |title=PDP-10 Reference Handbook, Book 3, Communicating with the Monitor |at=p. 5-5 |publisher=[[Digital Equipment Corporation]] (DEC) |date=1969 |access-date=July 10, 2014 |archive-url=https://web.archive.org/web/20111115083418/http://www.bitsavers.org/pdf/dec/pdp10/1970_PDP-10_Ref/1970PDP10Ref_Part3.pdf |archive-date=November 15, 2011 |url-status=live }}</ref> Because of this, DEC video terminals (by default) sent the DEL character for the key marked "Backspace" while the separate key marked "Delete" sent an [[escape sequence]]; many other competing terminals sent a BS character for the backspace key. The early Unix tty drivers, unlike some modern implementations, allowed only one character to be set to erase the previous character in canonical input processing (where a very simple line editor is available); this could be set to BS ''or'' DEL, but not both, resulting in recurring situations of ambiguity where users had to decide depending on what terminal they were using ([[Shell (computing)|shells]] that allow line editing, such as [[KornShell|ksh]], [[Bash (Unix shell)|bash]], and [[Z shell|zsh]], understand both). The assumption that no key sent a BS character allowed Ctrl+H to be used for other purposes, such as the "help" prefix command in [[GNU Emacs]].<ref>{{cite web|url=https://www.gnu.org/software/emacs/manual/html_node/emacs/Help.html|title=Help - GNU Emacs Manual|access-date=July 11, 2018|archive-url=https://web.archive.org/web/20180711223750/https://www.gnu.org/software/emacs/manual/html_node/emacs/Help.html|archive-date=July 11, 2018|url-status=live}}</ref> ====Escape==== Many more of the control characters have been assigned meanings quite different from their original ones. The "escape" character (ESC, code 27), for example, was intended originally to allow sending of other control characters as literals instead of invoking their meaning, an "escape sequence". This is the same meaning of "escape" encountered in URL encodings, [[C (programming language)|C language]] strings, and other systems where certain characters have a reserved meaning. Over time this interpretation has been co-opted and has eventually been changed. In modern usage, an ESC sent ''to'' the terminal usually indicates the start of a command sequence, which can be used to address the cursor, scroll a region, set/query various terminal properties, and more. They are usually in the form of a so-called "[[ANSI escape code]]" (often starting with a "[[Control Sequence Introducer]]", "CSI", "{{Mono|ESC [}}") from ECMA-48 (1972) and its successors. Some escape sequences do not have introducers, like the "Reset to Initial State", "RIS" command "{{Mono|ESC c}}".<ref>{{cite web|url=https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub86.pdf|title=ANSI X3.64-1979|access-date=October 27, 2024}}</ref> In contrast, an ESC read ''from'' the terminal is most often used as an [[out-of-band data|out-of-band]] character used to terminate an operation or special mode, as in the [[Text Editor and Corrector|TECO]] and [[Vi (text editor)|vi]] [[text editor]]s. In [[graphical user interface]] (GUI) and [[window (computing)|windowing]] systems, ESC generally causes an application to abort its current operation or to [[exit (system call)|exit]] (terminate) altogether. ====End of line==== The inherent ambiguity of many control characters, combined with their historical usage, created problems when transferring "plain text" files between systems. The best example of this is the [[newline]] problem on various [[operating system]]s. Teletype machines required that a line of text be terminated with both "carriage return" (which moves the printhead to the beginning of the line) and "line feed" (which advances the paper one line without moving the printhead). The name "carriage return" comes from the fact that on a manual [[typewriter]] the carriage holding the paper moves while the typebars that strike the ribbon remain stationary. The entire carriage had to be pushed (returned) to the right in order to position the paper for the next line. DEC operating systems ([[OS/8]], [[RT-11]], [[RSX-11]], [[RSTS/E|RSTS]], [[TOPS-10]], etc.) used both characters to mark the end of a line so that the console device (originally Teletype machines) would work. By the time so-called "glass TTYs" (later called CRTs or "dumb terminals") came along, the convention was so well established that [[backward compatibility]] necessitated continuing to follow it. When [[Gary Kildall]] created [[CP/M]], he was inspired by some of the command line interface conventions used in DEC's RT-11 operating system. Until the introduction of PC DOS in 1981, [[IBM]] had no influence in this because their 1970s operating systems used EBCDIC encoding instead of ASCII, and they were oriented toward punch-card input and line printer output on which the concept of "carriage return" was meaningless. IBM's PC DOS (also marketed as [[MS-DOS]] by Microsoft) inherited the convention by virtue of being loosely based on CP/M,<ref>{{cite web|url=http://dosmandrivel.blogspot.com/2007/08/is-dos-rip-off-of-cpm.html|title=Is DOS a Rip-Off of CP/M?|author=Tim Paterson|date=August 8, 2007|website=DosMan Drivel|author-link=Tim Paterson|access-date=April 19, 2018|archive-url=https://web.archive.org/web/20180420075137/http://dosmandrivel.blogspot.com/2007/08/is-dos-rip-off-of-cpm.html|archive-date=April 20, 2018|url-status=live}}</ref> and [[Windows]] in turn inherited it from MS-DOS. Requiring two characters to mark the end of a line introduces unnecessary complexity and ambiguity as to how to interpret each character when encountered by itself. To simplify matters, [[plain text]] data streams, including files, on [[Multics]] used line feed (LF) alone as a line terminator.<ref>{{cite conference |url=http://www.multicians.org/jhs-jfo-terminals.pdf |title=Technical and human engineering problems in connecting terminals to a time-sharing system |author-last1=Ossanna |author-first1=J. F. |author-link1=Joe Ossanna |author-last2=Saltzer |author-first2=J. H. |author-link2=Jerry Saltzer |date=November 17β19, 1970 |publisher=[[AFIPS]] Press |book-title=Proceedings of the November 17β19, 1970, [[Fall Joint Computer Conference]] (FJCC) |pages=355β362 |quote=Using a "new-line" function (combined carriage-return and line-feed) is simpler for both man and machine than requiring both functions for starting a new line; the American National Standard X3.4-1968 permits the line-feed code to carry the new-line meaning. |access-date=January 29, 2013 |archive-url=https://web.archive.org/web/20120819085101/http://www.multicians.org/jhs-jfo-terminals.pdf |archive-date=August 19, 2012 |url-status=live }}</ref>{{rp|357}} The tty driver would handle the LF to CRLF conversion on output so files can be directly printed to terminal, and NL (newline) is often used to refer to CRLF in [[UNIX]] documents. [[Unix]] and [[Unix-like]] systems, and [[Amiga]] systems, adopted this convention from Multics. On the other hand, the original [[Macintosh OS]], [[Apple DOS]], and [[ProDOS]] used carriage return (CR) alone as a line terminator; however, since Apple later replaced these obsolete operating systems with their Unix-based [[macOS]] (formerly named OS X) operating system, they now use line feed (LF) as well. The Radio Shack [[TRS-80]] also used a lone CR to terminate lines. Computers attached to the [[ARPANET]] included machines running operating systems such as TOPS-10 and [[TENEX (operating system)|TENEX]] using CR-LF line endings; machines running operating systems such as Multics using LF line endings; and machines running operating systems such as [[OS/360]] that represented lines as a character count followed by the characters of the line and which used EBCDIC rather than ASCII encoding. The [[Telnet]] protocol defined an ASCII "Network Virtual Terminal" (NVT), so that connections between hosts with different line-ending conventions and character sets could be supported by transmitting a standard text format over the network. Telnet used ASCII along with CR-LF line endings, and software using other conventions would translate between the local conventions and the NVT.<ref name="RFC-158">{{cite IETF |title=TELNET Protocol |rfc=158 |pages=4β5 |author-first=T. |author-last=O'Sullivan |date=1971-05-19 |publisher=[[Internet Engineering Task Force]] (IETF) |access-date=2013-01-28}}</ref> The [[File Transfer Protocol]] adopted the Telnet protocol, including use of the Network Virtual Terminal, for use when transmitting commands and transferring data in the default ASCII mode.<ref name="RFC-542">{{cite IETF |title=File Transfer Protocol |rfc=542 |author-first=Nancy J. |author-last=Neigus |date=1973-08-12 |publisher=[[Internet Engineering Task Force]] (IETF) |access-date=2013-01-28}}</ref><ref name="RFC-765">{{cite IETF |title=File Transfer Protocol |rfc=765 |author-first=Jon |author-last=Postel |author-link=Jon Postel |date=June 1980 |publisher=[[Internet Engineering Task Force]] (IETF) |access-date=2013-01-28}}</ref> This adds complexity to implementations of those protocols, and to other network protocols, such as those used for E-mail and the World Wide Web, on systems not using the NVT's CR-LF line-ending convention.<ref>{{cite web |url=https://www.mercurial-scm.org/wiki/EOLTranslationPlan |title=EOL translation plan for Mercurial |publisher=Mercurial |access-date=2017-06-24 |archive-url=https://web.archive.org/web/20160616235536/https://www.mercurial-scm.org/wiki/EOLTranslationPlan |archive-date=June 16, 2016 |url-status=live }}</ref><ref>{{cite web |title=Bare LFs in SMTP |url=http://cr.yp.to/docs/smtplf.html |author-first=Daniel J. |author-last=Bernstein |author-link=Daniel J. Bernstein |access-date=2013-01-28 |archive-url=https://web.archive.org/web/20111029013105/http://cr.yp.to/docs/smtplf.html |archive-date=October 29, 2011 |url-status=live }}</ref> ====End of file/stream==== The PDP-6 monitor,<ref name="pdp-6-monitor-manual"/> and its PDP-10 successor TOPS-10,<ref name="pdp-10-monitor-manual"/> used control-Z (SUB) as an end-of-file indication for input from a terminal. Some operating systems such as CP/M tracked file length only in units of disk blocks, and used control-Z to mark the end of the actual text in the file.<ref>{{cite book |url=http://www.bitsavers.org/pdf/digitalResearch/cpm/1.4/CPM_1.4_Interface_Guide_1978.pdf |title=CP/M 1.4 Interface Guide |date=1978 |page=10 |publisher=[[Digital Research]] |access-date=October 7, 2017 |archive-url=https://web.archive.org/web/20190529055800/http://bitsavers.org/pdf/digitalResearch/cpm/1.4/CPM_1.4_Interface_Guide_1978.pdf |archive-date=May 29, 2019 |url-status=live }}</ref> For these reasons, EOF, or [[end-of-file]], was used colloquially and conventionally as a [[three-letter acronym]] for control-Z instead of SUBstitute. The end-of-text character ([[End-of-text character|ETX]]), also known as [[control-C]], was inappropriate for a variety of reasons, while using control-Z as the control character to end a file is analogous to the letter Z's position at the end of the alphabet, and serves as a very convenient [[Mnemonic device|mnemonic aid]]. A historically common and still prevalent convention uses the ETX character convention to interrupt and halt a program via an input data stream, usually from a keyboard. The Unix terminal driver uses the end-of-transmission character ([[End-of-Transmission character|EOT]]), also known as control-D, to indicate the end of a data stream. In the [[C programming language]], and in Unix conventions, the [[null character]] is used to terminate text [[string (computer science)|strings]]; such [[null-terminated string]]s can be known in abbreviation as ASCIZ or ASCIIZ, where here Z stands for "zero".
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)