Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Compiler
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== Front end ==== [[File:Xxx Scanner and parser example for C.gif|thumb|right|400px|[[Lexical analysis|Lexer]] and [[Parsing|parser]] example for [[C (programming language)|C]]. Starting from the sequence of characters "<code>if(net>0.0)total+=net*(1.0+tax/100.0);</code>", the scanner composes a sequence of [[Lexical analysis#token|tokens]], and categorizes each of them, for example as {{color|#600000|identifier}}, {{color|#606000|reserved word}}, {{color|#006000|number literal}}, or {{color|#000060|operator}}. The latter sequence is transformed by the parser into a [[abstract syntax tree|syntax tree]], which is then treated by the remaining compiler phases. The scanner and parser handles the [[regular grammar|regular]] and properly [[context-free grammar|context-free]] parts of the [[C syntax|grammar for C]], respectively.]] The front end analyzes the source code to build an internal representation of the program, called the [[intermediate representation]] (IR). It also manages the [[symbol table]], a data structure mapping each symbol in the source code to associated information such as location, type and scope. While the frontend can be a single monolithic function or program, as in a [[scannerless parser]], it was traditionally implemented and analyzed as several phases, which may execute sequentially or concurrently. This method is favored due to its modularity and [[separation of concerns]]. Most commonly, the frontend is broken into three phases: [[lexical analysis]] (also known as lexing or scanning), [[syntax analysis]] (also known as scanning or parsing), and [[Semantic analysis (compilers)|semantic analysis]]. Lexing and parsing comprise the syntactic analysis (word syntax and phrase syntax, respectively), and in simple cases, these modules (the lexer and parser) can be automatically generated from a grammar for the language, though in more complex cases these require manual modification. The lexical grammar and phrase grammar are usually [[context-free grammar]]s, which simplifies analysis significantly, with context-sensitivity handled at the semantic analysis phase. The semantic analysis phase is generally more complex and written by hand, but can be partially or fully automated using [[attribute grammar]]s. These phases themselves can be further broken down: lexing as scanning and evaluating, and parsing as building a [[Parse tree|concrete syntax tree]] (CST, parse tree) and then transforming it into an [[abstract syntax tree]] (AST, syntax tree). In some cases additional phases are used, notably ''line reconstruction'' and ''preprocessing,'' but these are rare. The main phases of the front end include the following: * ''{{visible anchor|Line reconstruction}}'' converts the input character sequence to a canonical form ready for the parser. Languages which [[stropping (syntax)|strop]] their keywords or allow arbitrary spaces within identifiers require this phase. The [[top-down parsing|top-down]], [[recursive descent parser|recursive-descent]], table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. [[Atlas Autocode]] and [[Edinburgh IMP|Imp]] (and some implementations of [[ALGOL]] and [[Coral 66]]) are examples of stropped languages whose compilers would have a ''Line Reconstruction'' phase. * ''[[Preprocessor|Preprocessing]]'' supports [[Macro (computer science)|macro]] substitution and [[conditional compilation]]. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as [[Scheme (programming language)|Scheme]] support macro substitutions based on syntactic forms. * ''[[Lexical analysis]]'' (also known as ''lexing'' or ''tokenization'') breaks the source code text into a sequence of small pieces called ''lexical tokens''.<ref>Aho, Lam, Sethi, Ullman 2007, p. 5-6, 109-189</ref> This phase can be divided into two stages: the ''scanning'', which segments the input text into syntactic units called ''lexemes'' and assigns them a category; and the ''evaluating'', which converts lexemes into a processed value. A token is a pair consisting of a ''token name'' and an optional ''token value''.<ref>Aho, Lam, Sethi, Ullman 2007, p. 111</ref> Common token categories may include identifiers, keywords, separators, operators, literals and comments, although the set of token categories varies in different [[programming language]]s. The lexeme syntax is typically a [[regular language]], so a [[finite-state automaton]] constructed from a [[regular expression]] can be used to recognize it. The software doing lexical analysis is called a [[lexical analyzer]]. This may not be a separate stepโit can be combined with the parsing step in [[scannerless parsing]], in which case parsing is done at the character level, not the token level. * ''[[Syntax analysis]]'' (also known as ''parsing'') involves [[parsing]] the token sequence to identify the syntactic structure of the program. This phase typically builds a [[parse tree]], which replaces the linear sequence of tokens with a tree structure built according to the rules of a [[formal grammar]] which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.<ref>Aho, Lam, Sethi, Ullman 2007, p. 8, 191-300</ref> * ''[[Semantic analysis (compilers)|Semantic analysis]]'' adds semantic information to the [[parse tree]] and builds the [[symbol table]]. This phase performs semantic checks such as [[type checking]] (checking for type errors), or [[object binding]] (associating variable and function references with their definitions), or [[definite assignment analysis|definite assignment]] (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the [[parsing]] phase, and logically precedes the [[code generation (compiler)|code generation]] phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)