Editing Assembly language (section)

===Assembler===<!-- This section is linked from [[Computer software]] -->
An '''assembler''' program creates [[object code]] by [[translator (computing)|translating]] combinations of [[mnemonic]]s and [[Syntax (programming languages)|syntax]] for operations and addressing modes into their numerical equivalents. This representation typically includes an ''operation code'' ("[[opcode]]") as well as other control [[bit]]s and data. The assembler also calculates constant expressions and resolves [[identifier|symbolic names]] for memory locations and other entities.<ref name="Salomon_1992"/> The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include [[Macro (computer science)|macro]] facilities for performing textual substitution – e.g., to generate common short sequences of instructions as [[inline expansion|inline]], instead of ''called'' [[subroutine]]s.

Some assemblers may also be able to perform some simple types of [[instruction set architecture|instruction set]]-specific [[compiler optimization|optimization]]s. One concrete example of this may be the ubiquitous [[x86]] assemblers from various vendors. Called [[jump-sizing]],<ref name="Salomon_1992"/> most of them are able to perform jump-instruction replacements (long jumps replaced by short or relative jumps) in any number of passes, on request. Others may even do simple rearrangement or insertion of instructions, such as some assemblers for [[RISC architectures]] that can help optimize a sensible [[instruction scheduling]] to exploit the [[CPU pipeline]] as efficiently as possible.<ref>{{cite conference |url=https://www.researchgate.net/publication/262389375 |doi=10.1145/2465554.2465559 |title=Improving processor efficiency by statically pipelining instructions |book-title=Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems |year=2013 |last1=Finlayson |first1=Ian |last2=Davis |first2=Brandon |last3=Gavin |first3=Peter |last4=Uh |first4=Gang-Ryung |last5=Whalley |first5=David |last6=Själander |first6=Magnus |last7=Tyson |first7=Gary |pages=33–44 |isbn=9781450320856 |s2cid=8015812}}</ref>

Assemblers have been available since the 1950s, as the first step above machine language and before [[high-level programming language]]s such as [[Fortran]], [[ALGOL|Algol]], [[COBOL]] and [[Lisp (programming language)|Lisp]]. There have also been several classes of translators and semi-automatic [[code generation (compiler)|code generators]] with properties similar to both assembly and high-level languages, with [[Speedcode]] as perhaps one of the better-known examples.

There may be several assemblers with different [[Syntax (programming languages)|syntax]] for a particular [[Central processing unit|CPU]] or [[instruction set architecture]]. For instance, an instruction to add memory data to a register in a [[x86]]-family processor might be <code>add eax,[ebx]</code>, in original ''[[Intel syntax]]'', whereas this would be written <code>addl (%ebx),%eax</code> in the ''[[AT&T syntax]]'' used by the [[GNU Assembler]]. Despite different appearances, different syntactic forms generally generate the same numeric [[machine code]]. A single assembler may also have different modes in order to support variations in syntactic forms as well as their exact semantic interpretations (such as [[FASM]]-syntax, [[TASM]]-syntax, ideal mode, etc., in the special case of [[x86 assembly language|x86 assembly]] programming).

==== {{Anchor|Two-pass assembler}} Number of passes====
There are two types of assemblers based on how many passes through the source are needed (how many times the assembler reads the source) to produce the object file.
* '''One-pass assemblers''' process the source code once.  For symbols used before they are defined, the assembler will emit [[Erratum|"errata"]] after the eventual definition, telling the [[linker (computing)|linker]] or the loader to patch the locations where the as yet undefined symbols had been used.
* '''Multi-pass assemblers''' create a table with all symbols and their values in the first passes, then use the table in later passes to generate code.
In both cases, the assembler must be able to determine the size of each instruction on the initial passes in order to calculate the addresses of subsequent symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand, the assembler will make a pessimistic estimate when first encountering the operation, and if necessary, pad it with one or more
"[[NOP (code)|no-operation]]" instructions in a later pass or the errata. In an assembler with [[peephole optimization]], addresses may be recalculated between passes to allow replacing pessimistic code with code tailored to the exact distance from the target.

The original reason for the use of one-pass assemblers was memory size and speed of assembly – often a second pass would require storing the symbol table in memory (to handle [[forward reference]]s), rewinding and rereading the program source on [[magnetic-tape data storage|tape]], or rereading a deck of [[punched card|cards]] or [[punched tape|punched paper tape]]. Later computers with much larger memories (especially disc storage), had the space to perform all necessary processing without such re-reading. The advantage of the multi-pass assembler is that the absence of errata makes the [[linker (computing)|linking process]] (or the [[loader (computing)|program load]] if the assembler directly produces executable code) faster.<ref name="Beck_1996"/>

'''Example:''' in the following code snippet, a one-pass assembler would be able to determine the address of the backward reference <var>BKWD</var> when assembling statement <var>S2</var>, but would not be able to determine the address of the forward reference <var>FWD</var> when assembling the branch statement <var>S1</var>; indeed, <var>FWD</var> may be undefined. A two-pass assembler would determine both addresses in pass 1, so they would be known when generating code in pass 2.
 {{var|S1}}   B    {{var|FWD}}
   ...
 {{var|FWD}}   EQU *
   ...
 {{var|BKWD}}  EQU *
   ...
 {{var|S2}}    B   {{var|BKWD}}

====High-level assemblers====
More sophisticated [[high-level assembler]]s provide language abstractions such as:
* High-level procedure/function declarations and invocations
* Advanced control structures (IF/THEN/ELSE, SWITCH)
* High-level abstract data types, including structures/records, unions, classes, and sets
* Sophisticated macro processing (although available on ordinary assemblers since the late 1950s for, e.g., the [[IBM 700/7000 series|IBM 700 series]] and [[IBM 700/7000 series|IBM 7000 series]], and since the 1960s for [[IBM System/360]] (S/360), amongst other machines)
* [[Object-oriented programming]] features such as [[class (computer programming)|class]]es, [[Object (computer science)|object]]s, [[Abstraction (computer science)|abstraction]], [[Polymorphism (computer science)|polymorphism]], and [[inheritance (object-oriented programming)|inheritance]]<ref name="Hyde_2003"/>
See [[#Language design|Language design]] below for more details.