Editing Tree-adjoining grammar

{{Short description|Grammar formalism}}
'''Tree-adjoining grammar''' ('''TAG''') is a [[grammar formalism]] defined by [[Aravind Joshi]].  Tree-adjoining grammars are somewhat similar to [[context-free grammar]]s, but the elementary unit of rewriting is the tree rather than the symbol. Whereas context-free grammars have rules for rewriting symbols as strings of other symbols, tree-adjoining grammars have rules for rewriting the nodes of trees as other trees (see [[tree (graph theory)]] and [[tree (data structure)]]).

==History==
TAG originated in investigations by Joshi and his students into the family of adjunction grammars (AG),<ref name="JoshiKosarajuYamada1969">{{cite document
 | last = Joshi
 | first = Aravind
 |author2=S. R. Kosaraju
 |author3=H. Yamada
 | title = String Adjunct Grammars
 | year = 1969
 | publisher = Proceedings Tenth Annual Symposium on Automata Theory, Waterloo, Canada }}
{{Citation
 | last1 =Joshi
 | first1 =Aravind K.
 | last2 =Kosaraju
 | first2 =S. Rao
 | last3 =Yamada
 | first3 =H. M.
 | title =String Adjunct Grammars: I. Local and Distributed Adjunction
 | journal =Information and Control
 | volume =21
 | issue =2
 | pages =93–116
 | year =1972
 | doi =10.1016/S0019-9958(72)90051-4
 | doi-access =free
 }}
{{Citation
 | last1 =Joshi
 | first1 =Aravind K.
 | last2 =Kosaraju
 | first2 =S. Rao
 | last3 =Yamada
 | first3 =H. M.
 | title =String Adjunct Grammars: II. Equational Representation, Null Symbols, and Linguistic Relevance
 | journal =Information and Control
 | volume =21
 | issue =3
 | pages =235–260
 | year =1972
 | doi =10.1016/S0019-9958(72)80005-6
 | doi-access =free
 }}
</ref> 
the "string grammar" of [[Zellig Harris]].<ref>{{cite book
 | last =Harris
 | first =Zellig S.
 | title =String analysis of sentence structure
 | publisher =Mouton & Co.
 | series =Papers on Formal Linguistics
 | volume =1
 | date =1962
 | location =The Hague
 }}</ref> AGs handle [[exocentric]] properties of language in a natural and effective way, but do not have a good characterization of [[endocentric]] constructions; the converse is true of [[rewrite rule|rewrite grammars]], or [[phrase-structure grammar]] (PSG). In 1969, Joshi introduced a family of grammars that exploits this complementarity by mixing the two types of rules. A few very simple rewrite rules suffice to generate the vocabulary of strings for adjunction rules. This family is distinct from the [[Chomsky hierarchy|Chomsky-Schützenberger hierarchy]] but intersects it in interesting and linguistically relevant ways.<ref name="Joshi1969">{{cite document
 | last = Joshi
 | first = Aravind
 | title = Properties of Formal Grammars with Mixed Types of Rules and Their Linguistic Relevance
 | year = 1969
 | publisher = Proceedings Third International Symposium on Computational Linguistics, Stockholm, Sweden }}</ref> The center strings and adjunct strings can also be generated by a [[dependency grammar]], avoiding the limitations of rewrite systems entirely.<ref name="joshi-rambow2003">{{cite conference
 | last = Joshi
 | first = Aravind
 |author2=Owen Rambow
  | title = A Formalism for Dependency Grammar Based on Tree Adjoining Grammar
 | year = 2003
 | book-title = Proceedings of the Conference on Meaning-Text Theory
 | url = http://www1.cs.columbia.edu/~rambow/papers/joshi-rambow-2003.pdf}}</ref>
<ref name="xtagenglish">{{cite web
  | title = A Lexicalized Tree Adjoining Grammar for English
 | url = http://www.cis.upenn.edu/~xtag/tech-report/}}</ref>

==Description==
The rules in a TAG are trees with a special leaf node known as the ''foot node'', which is anchored to a word. 
There are two types of basic trees in TAG: ''initial'' trees (often represented as '<math>\alpha</math>') and ''auxiliary'' trees ('<math>\beta</math>').  Initial trees represent basic valency relations, while auxiliary trees allow for recursion.<ref name="jurafsky-martin2000">{{cite book
 | last = Jurafsky
 | first = Daniel
 | author-link = Daniel Jurafsky
 |author2=James H. Martin
  | title = Speech and Language Processing
 | year = 2000
 | pages = 354
 | publisher = Prentice Hall
 | location = Upper Saddle River, NJ }}</ref>
Auxiliary trees have the root (top) node and foot node labeled with the same symbol.
A derivation starts with an initial tree, combining via either ''substitution'' or ''adjunction''.   Substitution replaces a frontier node with another tree whose top node has the same label. The root/foot label of the auxiliary tree must match the label of the node at which it adjoins. Adjunction can thus have the effect of inserting an auxiliary tree into the center of another tree.<ref name="joshi-rambow2003"/>

Other variants of TAG allow [[multi-component tree adjoining grammars|multi-component trees]], trees with multiple foot nodes, and other extensions.

==Complexity and application==

Tree-adjoining grammars are more powerful (in terms of [[weak generative capacity]]) than [[context-free grammar]]s, but less powerful than [[linear context-free rewriting system]]s,<ref>Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer. Here: p.215-216</ref> [[indexed grammar|indexed]]<ref group=note>since for each tree-adjoining grammar, a linear indexed grammar can be found producing the same language, see [[#Equivalences|below]], and for the latter, a weakly equivalent (proper) indexed grammar can be found, in turn, see [[Indexed grammar#Computational Power]]</ref> or [[context-sensitive grammar|context-sensitive]] grammars.

A TAG can describe the language of squares (in which some arbitrary string is repeated), and the language <math>\{a^n b^n c^n d^n | 1 \le n \}</math>. This type of processing can be represented by an [[embedded pushdown automaton]].
Languages with cubes (i.e. triplicated strings) or with more than four distinct character strings of equal length cannot be generated by tree-adjoining grammars.

For these reasons, tree-adjoining grammars are often described as [[Mildly context-sensitive grammar|mildly context-sensitive]].
These grammar classes are conjectured to be powerful enough to model [[natural language]]s while remaining efficiently [[parser|parsable]] in the general case.<ref name="joshi1985">{{cite book
 | last = Joshi
 | first = Aravind
 | chapter = How much context-sensitivity is necessary for characterizing structural descriptions
 | year = 1985
 | publisher = Cambridge University Press
 | pages = [https://archive.org/details/naturallanguagep00dowt_276/page/n220 206]–250
 | title = Natural Language Processing: Theoretical, Computational, and Psychological Perspectives
 | url = https://archive.org/details/naturallanguagep00dowt_276
 | url-access = limited
 | editor = D. Dowty |editor2=L. Karttunen |editor3=A. Zwicky
 | location = New York, NY | isbn = 9780521262033
 }}</ref>

===Equivalences===

Vijay-Shanker and Weir (1994)<ref name="vijayshankarAndWeir1995">Vijay-Shanker, K. and Weir, David J. 1994. ''The Equivalence of Four Extensions of Context-Free Grammars''. Mathematical Systems Theory 27(6): 511–546.</ref> demonstrate that [[Indexed grammar#Linear indexed grammars|linear indexed grammars]], [[combinatory categorial grammar]], tree-adjoining grammars, and [[head grammar]]s are [[Weak equivalence (formal languages)|weakly equivalent]] formalisms, in that they all define the same string languages.

==Lexicalized ==
Lexicalized tree-adjoining grammars (LTAG) are a variant of TAG in which each elementary tree (initial or auxiliary) is associated with a lexical item.  A lexicalized grammar for English has been developed by the XTAG Research Group of the Institute for Research in Cognitive Science at the University of Pennsylvania.<ref name="xtagenglish"/>

==See also==
* [[String grammar]]

==Notes==
{{reflist|group=note}}

==References==
{{Reflist}}

==External links==
*[http://www.cis.upenn.edu/~xtag/ The XTAG project], which uses a TAG for natural language processing.
*[http://www.let.rug.nl/~vannoord/papers/diss/diss/node59.html A tutorial on TAG]
*[https://web.archive.org/web/20061123095945/http://wiki.loria.fr/wiki/SemConst/Documentation#Background SemConst Documentation] A quick survey on Syntax and Semantic Interface problematic within the TAG framework.
*[http://sourcesup.cru.fr/tulipa/ The TuLiPa project] The Tübingen Linguistic Parsing Architecture (TuLiPA) is a multi-formalism syntactic (and semantic) parsing environment, designed mainly for [[multi-component tree adjoining grammar]]s with [[tree tuple]]s
*[http://mgkit.gforge.inria.fr/ The Metagrammar Toolkit] which provides several tools to edit and compile [[MetaGrammars]] into TAGs. It also include a wide coverage French Metagrammars.
*[http://www.loria.fr/~azim/LLP2/help/fr/index.html LLP2] A [[lexicalized tree adjoining grammar]] parser which provides an  easy to use graphical environment (page in French)

{{Formal languages and grammars}}

{{DEFAULTSORT:Tree-Adjoining Grammar}}
[[Category:Generative linguistics]]
[[Category:Grammar frameworks]]