Editing Parsing (section)

=== Computational methods ===
{{main|Syntactic parsing (computational linguistics)}}
{{more citations needed section|date=February 2013}}
In some [[machine translation]] and [[natural language processing]] systems, written texts in human languages are parsed by computer programs.<ref name="ManningManning1999">{{cite book|author1=Christopher D.. Manning|author2=Christopher D. Manning|author3=Hinrich Schütze|title=Foundations of Statistical Natural Language Processing|url=https://books.google.com/books?id=YiFDxbEX3SUC&q=parsing|year=1999|publisher=MIT Press|isbn=978-0-262-13360-9}}</ref> Human sentences are not easily parsed by programs, as there is substantial [[syntactic ambiguity|ambiguity]] in the structure of human language, whose usage is to convey meaning (or [[semantics]]) amongst a potentially unlimited range of possibilities, but only some of which are germane to the particular case.<ref>{{Cite journal | doi=10.1207/s15516709cog2002_1|title = A Probabilistic Model of Lexical and Syntactic Access and Disambiguation| journal=Cognitive Science| volume=20| issue=2| pages=137–194|year = 1996|last1 = Jurafsky|first1 = Daniel| citeseerx=10.1.1.150.5711}}</ref> So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.{{citation needed|date=February 2018}}

In order to parse natural language data, researchers must first agree on the [[grammar]] to be used. The choice of syntax is affected by both [[language|linguistic]] and computational concerns; for instance some parsing systems use [[lexical functional grammar]], but in general, parsing for grammars of this type is known to be [[NP-complete]]. [[Head-driven phrase structure grammar]] is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn [[Treebank]]. [[Shallow parsing]] aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is [[dependency grammar]] parsing.

Most modern parsers are at least partly statistical; that is, they rely on a [[text corpus|corpus]] of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. ''(See [[machine learning]].)'' Approaches which have been used include straightforward [[PCFG]]s (probabilistic context-free grammars),<ref>Klein, Dan, and Christopher D. Manning. "[https://www.aclweb.org/anthology/P03-1054 Accurate unlexicalized parsing]." Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003.</ref> [[maximum entropy classifier|maximum entropy]],<ref>Charniak, Eugene. "[https://aclanthology.info/pdf/A/A00/A00-2018.pdf A maximum-entropy-inspired parser] {{Webarchive|url=https://web.archive.org/web/20190401145141/https://aclanthology.info/pdf/A/A00/A00-2018.pdf |date=2019-04-01 }}." Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Association for Computational Linguistics, 2000.</ref> and [[neural net]]s.<ref>Chen, Danqi, and Christopher Manning. "[http://www.aclweb.org/anthology/D14-1082 A fast and accurate dependency parser using neural networks]." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.</ref> Most of the more successful systems use ''lexical'' statistics (that is, they consider the identities of the words involved, as well as their [[part of speech]]). However such systems are vulnerable to [[overfitting]] and require some kind of [[smoothing]] to be effective.{{Citation needed|date=May 2008}}

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not [[context-free language|context-free]], some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the [[CYK algorithm]], usually with some [[heuristic (computer science)|heuristic]] to prune away unlikely analyses to save time. ''(See [[chart parsing]].)'' However some systems trade speed for accuracy using, e.g., linear-time versions of the [[Shift-reduce parsing|shift-reduce]] algorithm. A somewhat recent development has been [[parse reranking]] in which the parser proposes some large number of analyses, and a more complex system selects the best option.{{citation needed|date=January 2019}} In [[natural language understanding]] applications, [[semantic parsing|semantic parser]]s convert the text into a representation of its meaning.<ref name=":0">{{cite arXiv|last1=Jia|first1=Robin|last2=Liang|first2=Percy|date=2016-06-11|title=Data Recombination for Neural Semantic Parsing|eprint=1606.03622 |class=cs.CL}}</ref>