Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Code generation (compiler)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Converting computer code into a machine readable form}} {{Refimprove|Code generation intro|date=November 2006}} {{distinguish|text = Code generation in the context of [[Vibe coding]]}} In [[computing]], '''code generation''' is part of the process chain of a [[compiler]], in which an [[intermediate representation]] of [[source code]] is converted into a form (e.g., [[machine code]]) that can be readily executed by the target system. Sophisticated compilers typically perform [[multipass compiler|multiple passes]] over various intermediate forms. This multi-stage process is used because many [[algorithm]]s for [[code optimization]] are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (the ''backend'') needs to change from target to target. (For more information on compiler design, see [[Compiler]].) The input to the code generator typically consists of a [[parse tree]] or an [[abstract syntax tree]].<ref name="MuchnickAssociates1997">{{cite book|author1=Steven Muchnick|author2=Muchnick and Associates|title=Advanced Compiler Design Implementation|url=https://archive.org/details/advancedcompiler00much|url-access=registration|quote=code generation.|date=15 August 1997|publisher=Morgan Kaufmann|isbn=978-1-55860-320-2}}</ref> The tree is converted into a linear sequence of instructions, usually in an [[intermediate language]] such as [[three-address code]]. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a [[peephole optimization]] pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.) ==Major tasks== In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way. Tasks which are typically part of a sophisticated compiler's "code generation" phase include: * [[Instruction selection]]: which instructions to use. * [[Instruction scheduling]]: in which order to put those instructions. Scheduling is a speed optimization that can have a critical effect on [[instruction pipeline|pipeline]]d machines. * [[Register allocation]]: the allocation of [[Variable (programming)|variables]] to [[processor register]]s<ref name=ASU>{{Cite book|title=Compilers: Principles, Techniques, and Tools|last=Aho|first=Alfred V. |author2=Ravi Sethi |author3=Jeffrey D. Ullman|year=1987|publisher=Addison-Wesley|isbn=0-201-10088-6|page=15}}<!--|access-date=June 15, 2012--></ref> * [[Debugging data format|Debug data]] generation if required so the code can be [[Debugging|debugged]]. Instruction selection is typically carried out by doing a [[recursion|recursive]] [[postorder traversal]] on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree <code>W := ADD(X,MUL(Y,Z))</code> might be transformed into a linear sequence of instructions by recursively generating the sequences for <code>t1 := X</code> and <code>t2 := MUL(Y,Z)</code>, and then emitting the instruction <code>ADD W, t1, t2</code>. In a compiler that uses an intermediate language, there may be two instruction selection stages—one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the [[instruction set]] of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their corresponding [[opcode]]s. However, if the compiler is actually a [[Transcompiler|language translator]] (for example, one that converts [[Java (programming language)|Java]] to [[C++]]), then the second code-generation phase may involve ''building'' a tree from the linear intermediate code. ==Runtime code generation== When code generation occurs at [[Run time (program lifecycle phase)|runtime]], as in [[just-in-time compilation]] (JIT), it is important that the entire process be [[Algorithmic efficiency|efficient]] with respect to space and time. For example, when [[regular expression]]s are interpreted and used to generate code at runtime, a non-deterministic [[finite-state machine]] is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of [[Profiling (computer programming)|profiling]] information that is available only at runtime. ==Related concepts== The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core [[Transformational grammar|transformational]] operations of [[formal language theory]]. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example, [[YACC]] (Yet Another [[compiler-compiler|Compiler-Compiler]]) takes input in [[Backus–Naur form]] and converts it to a parser in [[C (programming language)|C]]. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed.<ref>[http://www.artima.com/weblogs/viewpost.jsp?thread=152273 Code Generation: The Real Lesson of Rails]. Artima.com (2006-03-16). Retrieved on 2013-08-10.</ref> Many [[integrated development environment]]s (IDEs) support some form of automatic [[source-code generation]], often using algorithms in common with compiler code generators, although commonly less complicated. (See also: [[Program transformation]], [[Data transformation]].) ===Reflection=== In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g., [[data type]]s) to produce code. In other words, the former ''adds'' information while the latter ''loses'' some of the information. One consequence of this information loss is that [[Reflection (computer science)|reflection]] becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution. ==See also== * [[Automatic programming]] * [[Comparison of code generation tools]] * [[Source-to-source compiler|Source-to-source compilation]]: automatic translation of a computer program from one programming language to another ==References== {{Reflist}} {{Authority control}} [[Category:Machine code]] [[Category:Compiler construction]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Authority control
(
edit
)
Template:Cite book
(
edit
)
Template:Distinguish
(
edit
)
Template:Refimprove
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)