Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Parsing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Parser === {{Redirect|Parser|the scripting language named Parser|Parser (programming language)}} A '''parser''' is a software component that takes input data (typically text) and builds a [[data structure]] β often some kind of [[parse tree]], [[abstract syntax tree]] or other hierarchical structure, giving a structural representation of the input while checking for correct syntax. The parsing may be preceded or followed by other steps, or these may be combined into a single step. The parser is often preceded by a separate [[Lexical analysis|lexical analyser]], which creates tokens from the sequence of input characters; alternatively, these can be combined in [[scannerless parsing]]. Parsers may be programmed by hand or may be automatically or semi-automatically generated by a [[parser generator]]. Parsing is complementary to [[templating language|templating]], which produces formatted ''output.'' These may be applied to different domains, but often appear together, such as the [[scanf]]/[[printf]] pair, or the input (front end parsing) and output (back end code generation) stages of a [[compiler]]. The input to a parser is typically text in some [[computer language]], but may also be text in a natural language or less structured textual data, in which case generally only certain parts of the text are extracted, rather than a parse tree being constructed. Parsers range from very simple functions such as [[scanf]], to complex programs such as the frontend of a [[C++ compiler]] or the [[HTML]] parser of a [[web browser]]. An important class of simple parsing is done using [[regular expression]]s, in which a group of regular expressions defines a [[regular language]] and a regular expression engine automatically generating a parser for that language, allowing [[pattern matching]] and extraction of text. In other contexts regular expressions are instead used prior to parsing, as the lexing step whose output is then used by the parser. The use of parsers varies by input. In the case of data languages, a parser is often found as the file reading facility of a program, such as reading in HTML or [[XML]] text; these examples are [[markup language]]s. In the case of [[programming language]]s, a parser is a component of a [[compiler]] or [[Interpreter (computing)|interpreter]], which parses the [[source code]] of a [[computer programming language]] to create some form of internal representation; the parser is a key step in the [[compiler frontend]]. Programming languages tend to be specified in terms of a [[deterministic context-free grammar]] because fast and efficient parsers can be written for them. For compilers, the parsing itself can be done in one pass or multiple passes β see [[one-pass compiler]] and [[multi-pass compiler]]. The implied disadvantages of a one-pass compiler can largely be overcome by adding [[Relocation (computing)|fix-ups]], where provision is made for code relocation during the forward pass, and the fix-ups are applied backwards when the current program segment has been recognized as having been completed. An example where such a fix-up mechanism would be useful would be a forward GOTO statement, where the target of the GOTO is unknown until the program segment is completed. In this case, the application of the fix-up would be delayed until the target of the GOTO was recognized. Conversely, a backward GOTO does not require a fix-up, as the location will already be known. Context-free grammars are limited in the extent to which they can express all of the requirements of a language. Informally, the reason is that the memory of such a language is limited. The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the [[Semantic analysis (compilers)|semantic analysis]] (contextual analysis) step. For example, in [[Python (programming language)|Python]] the following is syntactically valid code: <syntaxhighlight lang="python"> x = 1 print(x) </syntaxhighlight> The following code, however, is syntactically valid in terms of the context-free grammar, yielding a syntax tree with the same structure as the previous, but violates the semantic rule requiring variables to be initialized before use: <syntaxhighlight lang="python"> x = 1 print(y) </syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)