Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Parse tree
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Tree in formal language theory}} [[File:Parse-tree.svg|thumb|Parse tree of the string "aab" with respect to the production rules <math>\begin{matrix}\text{S}&\to&\text{AB}\\ \text{A}&\to&\text{aA}\\ \text{A}&\to&\varepsilon\\ \text{B}&\to&\text{b}\end{matrix}</math>]] A '''parse tree''' or '''parsing tree'''<ref>See Chiswell and Hodges 2007: 34.</ref> (also known as a '''derivation tree''' or '''concrete syntax tree''') is an ordered, rooted [[tree (data structure)|tree]] that represents the [[syntax|syntactic]] structure of a [[string (computer science)|string]] according to some [[context-free grammar]]. The term ''parse tree'' itself is used primarily in [[computational linguistics]]; in theoretical syntax, the term ''syntax tree'' is more common. Concrete syntax trees reflect the syntax of the input language, making them distinct from the [[abstract syntax tree]]s used in computer programming. Unlike [[sentence diagram#Reed-Kellogg system|Reed-Kellogg sentence diagram]]s used for teaching grammar, parse trees do not use distinct symbol shapes for different types of [[Constituent (linguistics)|constituents]]. Parse trees are usually constructed based on either the constituency relation of constituency grammars ([[phrase structure grammar]]s) or the dependency relation of [[dependency grammar]]s. Parse trees may be generated for [[sentence (linguistics)|sentence]]s in [[natural language]]s (see [[natural language processing]]), as well as during [[compiler|processing]] of computer languages, such as [[programming language]]s. A related concept is that of '''phrase marker''' or '''P-marker''', as used in [[transformational generative grammar]]. A phrase marker is a linguistic expression marked as to its phrase structure. This may be presented in the form of a tree, or as a bracketed expression. Phrase markers are generated by applying [[phrase structure rules]], and themselves are subject to further transformational rules.<ref name="Chomsky2014">{{cite book|author=Noam Chomsky|title=Aspects of the Theory of Syntax|url=https://books.google.com/books?id=ljFkBgAAQBAJ&q=%22phrase+marker%22|date=26 December 2014|publisher=MIT Press|isbn=978-0-262-52740-8}}</ref> A set of possible parse trees for a [[syntactically ambiguous]] sentence is called a "parse forest".<ref>Billot, Sylvie, and Bernard Lang. "[https://hal.inria.fr/inria-00075520/document The structure of shared forests in ambiguous parsing]."</ref> ==Nomenclature== [[Image:parseTree.svg|right|150px|thumb|A simple parse tree]] A parse tree is made up of nodes and branches.<ref>{{Cite web|url=https://www1.essex.ac.uk/linguistics/external/clmt/latex4ling/trees/parsetree/|title=The parsetree Package for Drawing Trees in LaTeX|publisher=University of Essex}}</ref> In the picture the parse tree is the entire structure, starting from S and ending in each of the leaf nodes (John, ball, the, hit). In a parse tree, each node is either a ''root'' node, a ''branch'' node, or a ''leaf'' node. In the above example, S is a root node, NP and VP are branch nodes, while John, ball, the, and hit are all leaf nodes. Nodes can also be referred to as parent nodes and child nodes. A ''parent'' node is one which has at least one other node linked by a branch under it. In the example, S is a parent of both NP and VP. A ''child'' node is one which has at least one node directly above it to which it is linked by a branch of the tree. Again from our example, hit is a child node of V. A '''nonterminal function''' is a function (node) which is either a root or a branch in that tree whereas a '''terminal function''' is a function (node) in a parse tree which is a leaf. For binary trees (where each parent node has two immediate child nodes), the number of possible parse trees for a sentence with ''n'' words is given by the [[Catalan number]] <math>C_n</math>. ==Constituency-based parse trees== The constituency-based parse trees of constituency grammars ([[phrase structure grammar]]s) distinguish between terminal and non-terminal nodes. The [[interior node]]s are labeled by [[nonterminal|non-terminal]] categories of the grammar, while the [[leaf node]]s are labeled by [[terminal symbol|terminal]] categories. The image below represents a constituency-based parse tree; it shows the syntactic structure of the [[English language|English]] sentence ''John hit the ball'': [[File:Parse tree 1.jpg|Parse tree PSG]] The parse tree is the entire structure, starting from S and ending in each of the leaf nodes (''John'', ''hit'', ''the'', ''ball''). The following abbreviations are used in the tree: ::* S for [[sentence (linguistics)|sentence]], the top-level structure in this example. ::* NP for [[noun phrase]]. The first (leftmost) NP, a single noun ''John'', serves as the [[subject (grammar)|subject]] of the sentence. The second one is the [[object (grammar)|object]] of the sentence. ::* VP for [[verb phrase]], which serves as the [[predicate (grammar)|predicate]]. ::* V for [[verb]]; in this case, it's the [[transitive verb]] ''hit''. ::* D for [[determiner (class)|determiner]]; in this instance the [[article (grammar)|definite article]] ''the''. ::* N for [[noun]]; in this case ''ball''. Each node in the tree is either a <em>root</em> node, a <em>branch</em> node, or a <em>leaf</em> node.<ref>See Carnie (2013:118ff.) for an introduction to the basic concepts of syntax trees (e.g. root node, terminal node, non-terminal node, etc.).</ref> A root node is a node that does not have any branches on top of it. Within a sentence, there is only ever one root node. A branch node is a parent node that connects to two or more child nodes. A leaf node, however, is a terminal node that does not dominate other nodes in the tree. S is the root node, NP and VP are branch nodes, and ''John'' (N), ''hit'' (V), ''the'' (D), and ''ball'' (N) are all leaf nodes. The leaves are the [[Lexical analysis|lexical tokens]] of the sentence. A parent node is one that has at least one other node linked by a branch under it. In the example, S is a parent of both N and VP. A child node is one that has at least one node directly above it to which it is linked by a branch of a tree. From the example, ''hit'' is a child node of V. The terms ''mother'' and ''daughter'' are also sometimes used for this relationship. ==Dependency-based parse trees== The dependency-based parse trees of [[dependency grammar]]s<ref>See for example Ágel et al. 2003/2006.</ref> see all nodes as terminal, which means they do not acknowledge the distinction between terminal and non-terminal categories. They are simpler on average than constituency-based parse trees because they contain fewer nodes. The dependency-based parse tree for the example sentence above is as follows: :::[[File:Parse2.jpg|Parse tree DG]] This parse tree lacks the phrasal categories (S, VP, and NP) seen in the constituency-based counterpart above. Like the constituency-based tree, [[constituent (linguistics)|constituent]] structure is acknowledged. Any complete sub-tree of the tree is a constituent. Thus this dependency-based parse tree acknowledges the subject noun ''John'' and the object noun phrase ''the ball'' as constituents just like the constituency-based parse tree does. The constituency vs. dependency distinction is far-reaching. Whether the additional syntactic structure associated with constituency-based parse trees is necessary or beneficial is a matter of debate. ==Phrase markers== Phrase markers, or P-markers, were introduced in early [[transformational generative grammar]], as developed by [[Noam Chomsky]] and others. A phrase marker representing the [[deep structure]] of a sentence is generated by applying [[phrase structure rule]]s. Then, this application may undergo further transformations. Phrase markers may be presented in the form of [[Tree (data structure)|trees]] (as in the above section on [[#Constituency-based parse trees|constituency-based parse trees]]), but are often given instead in the form of "bracketed expressions", which occupy less space in the memory. For example, a bracketed expression corresponding to the constituency-based tree given above may be something like: :<math>[_S\ [_\mathit{N}\ \text{John}]\ [_\mathit{VP}\ [_V\ \text{hit}]\ [_\mathit{NP}\ [_\mathit{D}\ \text{the}]\ [_N\ \text{ball}]]]]</math> As with trees, the precise construction of such expressions and the amount of detail shown can depend on the theory being applied and on the points that the query author wishes to illustrate. ==See also== {{div col|colwidth=22em}} *[[Abstract syntax tree]] *[[Constituent (linguistics)]] *[[Dependency grammar]] *[[Computational linguistics]] *[[Parsing]] (syntax analysis) *[[Phrase structure grammar]] *[[Sentence diagram]] *[[Terminal and nonterminal symbols]] {{div col end}} ==Notes== {{reflist}} ==References== {{div col|colwidth=30em}} *[[Vilmos Ágel|Ágel, V.]], Ludwig Eichinger, Hans-Werner Eroms, Peter Hellwig, Hans Heringer, and Hennig Lobin (eds.) 2003/6. [https://books.google.com/books?id=GWhp8IJ20X4C Dependency and valency: An international handbook of contemporary research]. Berlin: Walter de Gruyter. *Carnie, A. 2013. [https://books.google.com/books?id=or-Y3c9dY4UC&q=tree Syntax: A generative introduction], 3rd edition. Malden, MA: Wiley-Blackwell. *Chiswell, Ian and Wilfrid Hodges 2007. Mathematical logic. Oxford: Oxford University Press. *Aho, A. V., Sethi, R., and Ullman, J. D. 1986. [[Compilers: Principles, techniques, & tools]]. Reading, MA: Addison-Wesley. {{div col end}} ==External links== * [http://www.ductape.net/~eppie/tree/ Syntax Tree Editor] * [http://ltc.sourceforge.net/ Linguistic Tree Constructor] * [http://www.ironcreek.net/phpsyntaxtree/ phpSyntaxTree] – Online parse tree drawing site * [http://lrv.bplaced.net/syntaxtree/ phpSyntaxTree (Unicode)] – Online parse tree drawing site (improved version that supports Unicode) * [http://yohasebe.com/rsyntaxtree/ rSyntaxTree] Enhanced version of phpSyntaxTree in Ruby with Unicode and Vectorized graphics * [http://www.ling.upenn.edu/advice/latex/qtree/ Qtree] – [[LaTeX]] package for drawing parse trees * [http://www.mapsofspeech.com/2017/10/02/treeform/ TreeForm Syntax Tree Drawing Software] * [http://trimc-nlp.blogspot.com/2013/05/phrase-structure-and-dependency-parsing.html Visual Introduction to Parse Trees] Introduction and Transformation * [https://www.youtube.com/watch?v=UTnHwzVAIOo OpenCourseOnline] Dependency Parse Introduction (Christopher Manning) * [http://www.surdeanu.info/mihai/teaching/ista555-fall13/readings/PennTreebankConstituents.html#VBZ Penn Treebank II Constituent Tags] {{Parsers}} [[Category:Syntax]] [[Category:Generative syntax]] [[Category:Trees (data structures)]] [[Category:Mathematical linguistics]] [[Category:Computational linguistics]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite web
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:Parsers
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)