Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
ASCII
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Design considerations== ===Bit width=== The X3.2 subcommittee designed ASCII based on the earlier teleprinter encoding systems. Like other [[character encoding]]s, ASCII specifies a correspondence between digital bit patterns and [[character (computing)|character]] symbols (i.e. [[grapheme]]s and [[control character]]s). This allows [[Digital data|digital]] devices to communicate with each other and to process, store, and communicate character-oriented information such as written language. Before ASCII was developed, the encodings in use included 26 [[English alphabet|alphabetic]] characters, 10 [[numerical digit]]s, and from 11 to 25 special graphic symbols. To include all these, and control characters compatible with the [[CCITT|Comité Consultatif International Téléphonique et Télégraphique]] (CCITT) [[International Telegraph Alphabet No. 2]] (ITA2) standard of 1932,<ref>{{cite web |url=http://handle.itu.int/11.1004/020.1000/4.5.43.en.101 |title=Telegraph Regulations and Final Protocol (Madrid, 1932) |access-date=9 Jun 2024 |archive-url=https://web.archive.org/web/20230821020920/https://search.itu.int/history/HistoryDigitalCollectionDocLibrary/4.5.43.en.101.pdf |archive-date=21 August 2023}}</ref><ref name="bdcode">{{cite web |author-last=Smith |author-first=Gil |title=Teletype Communication Codes |publisher=Baudot.net |date=2001 |url=http://www.baudot.net/docs/smith--teletype-codes.pdf |access-date=2008-07-11 |archive-url=https://web.archive.org/web/20080820043949/http://www.baudot.net/docs/smith--teletype-codes.pdf |archive-date=August 20, 2008 |url-status=live }}</ref> [[FIELDATA]] (1956{{citation needed|date=June 2016|reason=My sources state 1957 rather than 1956, but Wikipedia states 1956 in various places. This needs to be sorted out with better sources.}}), and early [[EBCDIC]] (1963), more than 64 codes were required for ASCII. ITA2 was in turn based on [[Baudot code]], the 5-bit telegraph code Émile Baudot invented in 1870 and patented in 1874.<ref name="bdcode" /> The committee debated the possibility of a [[Shift code|shift]] function (like in [[ITA2]]), which would allow more than 64 codes to be represented by a [[six-bit character code|six-bit code]]. In a shifted code, some character codes determine choices between options for the following character codes. It allows compact encoding, but is less reliable for [[data transmission]], as an error in transmitting the shift code typically makes a long part of the transmission unreadable. The standards committee decided against shifting, and so ASCII required at least a seven-bit code.<ref name="Mackenzie_1980"/>{{rp|pages=215 §13.6, 236 §4}} The committee considered an eight-bit code, since eight bits ([[octet (computing)|octet]]s) would allow two four-bit patterns to efficiently encode two digits with [[binary-coded decimal]]. However, it would require all data transmission to send eight bits when seven could suffice. The committee voted to use a seven-bit code to minimize costs associated with data transmission. Since perforated tape at the time could record eight bits in one position, it also allowed for a [[parity bit]] for [[error checking]] if desired.<ref name="Mackenzie_1980"/>{{rp|pages=217 §c, 236 §5}} [[8-bit computing|Eight-bit]] machines (with octets as the native data type) that did not use parity checking typically set the eighth bit to 0.<ref name="Sawyer_1995">{{cite book |author-first1=Stanley A. |author-last1=Sawyer |author-first2=Steven George |author-last2=Krantz |title=A TeX Primer for Scientists |url=https://books.google.com/books?id=bXLDwmIJNkUC&pg=PA13 |date=1995 |publisher=[[CRC Press]] |isbn=978-0-8493-7159-2 |page=13 |bibcode=1995tps..book.....S |access-date=October 29, 2016 |archive-url=https://web.archive.org/web/20161222151907/https://books.google.com/books?id=bXLDwmIJNkUC&pg=PA13 |archive-date=December 22, 2016 |url-status=live }}</ref> ===Internal organization=== The code itself was patterned so that most control codes were together and all graphic codes were together, for ease of identification. The first two so-called ''ASCII sticks''{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> (32 positions) were reserved for control characters.<ref name="Mackenzie_1980"/>{{rp|220, 236 8,9)}} The [[Space (punctuation)|"space" character]] had to come before graphics to make [[sorting algorithm|sorting]] easier, so it became position 20<sub>[[hexadecimal|hex]]</sub>;<ref name="Mackenzie_1980"/>{{rp|237 §10}} for the same reason, many special signs commonly used as separators were placed before digits. The committee decided it was important to support uppercase [[sixbit code pages|64-character alphabets]], and chose to pattern ASCII so it could be reduced easily to a usable 64-character set of graphic codes,<ref name="Mackenzie_1980"/>{{rp|228, 237 §14}} as was done in the [[DEC SIXBIT]] code (1963). [[Lower case|Lowercase]] letters were therefore not interleaved with [[uppercase]]. To keep options available for lowercase letters and other graphics, the special and numeric codes were arranged before the letters, and the letter ''A'' was placed in position 41<sub>[[hexadecimal|hex]]</sub> to match the draft of the corresponding British standard.<ref name="Mackenzie_1980"/>{{rp|238 §18}} The digits 0–9 are prefixed with 011, but the remaining [[Nibble|4 bits]] correspond to their respective values in binary, making conversion with [[binary-coded decimal]] straightforward (for example, 5 in encoded to 011''0101'', where 5 is ''0101'' in binary). Many of the non-alphanumeric characters were positioned to correspond to their shifted position on typewriters; an important subtlety is that these were based on ''mechanical'' typewriters, not ''electric'' typewriters.<ref name="Savard">{{cite web |title=Computer Keyboards |url=http://www.quadibloc.com/comp/kybint.htm |author-first=John J. G. |author-last=Savard |access-date=2014-08-24 |archive-url=https://web.archive.org/web/20140924183236/http://www.quadibloc.com/comp/kybint.htm |archive-date=September 24, 2014 |url-status=live }}</ref> Mechanical typewriters followed the [[de facto standard|''de facto'' standard]] set by the [[Remington No. 2]] (1878), the first typewriter with a shift key, and the shifted values of <code>23456789-</code> were <code>"#$%_&'()</code>{{snd}} early typewriters omitted ''0'' and ''1'', using ''O'' (capital letter ''o'') and ''l'' (lowercase letter ''L'') instead, but <code>1!</code> and <code>0)</code> pairs became standard once 0 and 1 became common. Thus, in ASCII <code>!"#$%</code> were placed in the second stick,{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> positions 1–5, corresponding to the digits 1–5 in the adjacent stick.{{Efn|name="NB_Stick"}}<ref name="Bemer_1980_Inside"/> The parentheses could not correspond to ''9'' and ''0'', however, because the place corresponding to ''0'' was taken by the space character. This was accommodated by removing <code>_</code> (underscore) from ''6'' and shifting the remaining characters, which corresponded to many European typewriters that placed the parentheses with ''8'' and ''9''. This discrepancy from typewriters led to [[bit-paired keyboard]]s, notably the [[Teletype Model 33]], which used the left-shifted layout corresponding to ASCII, differently from traditional mechanical typewriters. Electric typewriters, notably the [[IBM Selectric]] (1961), used a somewhat different layout that has become ''de facto'' standard on computers{{snd}} following the [[IBM PC]] (1981), especially [[Model M]] (1984){{snd}} and thus shift values for symbols on modern keyboards do not correspond as closely to the ASCII table as earlier keyboards did. The <code>/?</code> pair also dates to the No. 2, and the <code>,< .></code> pairs were used on some keyboards (others, including the No. 2, did not shift <code>,</code> (comma) or <code>.</code> (full stop) so they could be used in uppercase without unshifting). However, ASCII split the <code>;:</code> pair (dating to No. 2), and rearranged mathematical symbols (varied conventions, commonly <code>-* =+</code>) to <code>:* ;+ -=</code>. Some then-common typewriter characters were not included, notably <code>½ ¼ ¢</code>, while <code>^ ` ~ </code> were included as diacritics for international use, and <code>< ></code> for mathematical use, together with the simple line characters <code>\ |</code> (in addition to common <code>/</code>). The ''@'' symbol was not used in continental Europe and the committee expected it would be replaced by an accented ''À'' in the French variation, so the ''@'' was placed in position 40<sub>[[hexadecimal|hex]]</sub>, right before the letter A.<ref name="Mackenzie_1980"/>{{rp|243}} The control codes felt essential for data transmission were the start of message (SOM), end of address (EOA), [[end of message]] (EOM), end of transmission (EOT), "who are you?" (WRU), "are you?" (RU), a reserved device control (DC0), synchronous idle (SYNC), and acknowledge (ACK). These were positioned to maximize the [[Hamming distance]] between their bit patterns.<ref name="Mackenzie_1980"/>{{rp|243–245}} ===<span class="anchor" id="Order"></span>Character order=== ASCII-code order is also called ''ASCIIbetical'' order.<ref>{{cite magazine |url=https://www.pcmag.com/encyclopedia_term/0,2542,t=ASCIIbetical&i=38025,00.asp |title=ASCIIbetical definition |magazine=[[PC Magazine]] |access-date=2008-04-14 |archive-url=https://web.archive.org/web/20130309183509/http://www.pcmag.com/encyclopedia_term/0%2C2542%2Ct%3DASCIIbetical%26i%3D38025%2C00.asp |archive-date=March 9, 2013 |url-status=live }}</ref> [[Collation]] of data is sometimes done in this order rather than "standard" alphabetical order ([[collating sequence]]). The main deviations in ASCII order are: * All uppercase come before lowercase letters; for example, "Z" precedes "a" * Digits and many punctuation marks come before letters An intermediate order converts uppercase letters to lowercase before comparing ASCII values.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)