Editing Parsing (section)

== Human languages ==
{{main category|Natural language parsing}}

=== Traditional methods ===
The traditional grammatical exercise of parsing, sometimes known as ''clause analysis'', involves breaking down a text into its component [[Part of speech|parts of speech]] with an explanation of the form, function, and syntactic relationship of each part.<ref>{{cite web | title=Grammar and Composition | url=http://grammar.about.com/od/pq/g/parsingterm.htm | access-date=2012-11-24 | archive-date=2016-12-01 | archive-url=https://web.archive.org/web/20161201190245/http://grammar.about.com/od/pq/g/parsingterm.htm | url-status=dead }}</ref> This is determined in large part from study of the language's [[conjugation (grammar)|conjugation]]s and [[declensions]], which can be quite intricate for heavily [[Inflection|inflected]] languages. To parse a phrase such as "man bites dog" involves noting that the [[Grammatical number|singular]] noun "man" is the [[Subject (grammar)|subject]] of the sentence, the verb "bites" is the [[Grammatical person|third person singular]] of the [[present tense]] of the verb "to bite", and the singular noun "dog" is the [[Object (grammar)|object]] of the sentence. Techniques such as [[sentence diagram]]s are sometimes used to indicate relation between elements in the sentence.

Parsing was formerly central to the teaching of grammar throughout the English-speaking world, and widely regarded as basic to the use and understanding of written language.{{cn|date=February 2025}}

=== Computational methods ===
{{main|Syntactic parsing (computational linguistics)}}
{{more citations needed section|date=February 2013}}
In some [[machine translation]] and [[natural language processing]] systems, written texts in human languages are parsed by computer programs.<ref name="ManningManning1999">{{cite book|author1=Christopher D.. Manning|author2=Christopher D. Manning|author3=Hinrich Schütze|title=Foundations of Statistical Natural Language Processing|url=https://books.google.com/books?id=YiFDxbEX3SUC&q=parsing|year=1999|publisher=MIT Press|isbn=978-0-262-13360-9}}</ref> Human sentences are not easily parsed by programs, as there is substantial [[syntactic ambiguity|ambiguity]] in the structure of human language, whose usage is to convey meaning (or [[semantics]]) amongst a potentially unlimited range of possibilities, but only some of which are germane to the particular case.<ref>{{Cite journal | doi=10.1207/s15516709cog2002_1|title = A Probabilistic Model of Lexical and Syntactic Access and Disambiguation| journal=Cognitive Science| volume=20| issue=2| pages=137–194|year = 1996|last1 = Jurafsky|first1 = Daniel| citeseerx=10.1.1.150.5711}}</ref> So an utterance "Man bites dog" versus "Dog bites man" is definite on one detail but in another language might appear as "Man dog bites" with a reliance on the larger context to distinguish between those two possibilities, if indeed that difference was of concern. It is difficult to prepare formal rules to describe informal behaviour even though it is clear that some rules are being followed.{{citation needed|date=February 2018}}

In order to parse natural language data, researchers must first agree on the [[grammar]] to be used. The choice of syntax is affected by both [[language|linguistic]] and computational concerns; for instance some parsing systems use [[lexical functional grammar]], but in general, parsing for grammars of this type is known to be [[NP-complete]]. [[Head-driven phrase structure grammar]] is another linguistic formalism which has been popular in the parsing community, but other research efforts have focused on less complex formalisms such as the one used in the Penn [[Treebank]]. [[Shallow parsing]] aims to find only the boundaries of major constituents such as noun phrases. Another popular strategy for avoiding linguistic controversy is [[dependency grammar]] parsing.

Most modern parsers are at least partly statistical; that is, they rely on a [[text corpus|corpus]] of training data which has already been annotated (parsed by hand). This approach allows the system to gather information about the frequency with which various constructions occur in specific contexts. ''(See [[machine learning]].)'' Approaches which have been used include straightforward [[PCFG]]s (probabilistic context-free grammars),<ref>Klein, Dan, and Christopher D. Manning. "[https://www.aclweb.org/anthology/P03-1054 Accurate unlexicalized parsing]." Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003.</ref> [[maximum entropy classifier|maximum entropy]],<ref>Charniak, Eugene. "[https://aclanthology.info/pdf/A/A00/A00-2018.pdf A maximum-entropy-inspired parser] {{Webarchive|url=https://web.archive.org/web/20190401145141/https://aclanthology.info/pdf/A/A00/A00-2018.pdf |date=2019-04-01 }}." Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Association for Computational Linguistics, 2000.</ref> and [[neural net]]s.<ref>Chen, Danqi, and Christopher Manning. "[http://www.aclweb.org/anthology/D14-1082 A fast and accurate dependency parser using neural networks]." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.</ref> Most of the more successful systems use ''lexical'' statistics (that is, they consider the identities of the words involved, as well as their [[part of speech]]). However such systems are vulnerable to [[overfitting]] and require some kind of [[smoothing]] to be effective.{{Citation needed|date=May 2008}}

Parsing algorithms for natural language cannot rely on the grammar having 'nice' properties as with manually designed grammars for programming languages. As mentioned earlier some grammar formalisms are very difficult to parse computationally; in general, even if the desired structure is not [[context-free language|context-free]], some kind of context-free approximation to the grammar is used to perform a first pass. Algorithms which use context-free grammars often rely on some variant of the [[CYK algorithm]], usually with some [[heuristic (computer science)|heuristic]] to prune away unlikely analyses to save time. ''(See [[chart parsing]].)'' However some systems trade speed for accuracy using, e.g., linear-time versions of the [[Shift-reduce parsing|shift-reduce]] algorithm. A somewhat recent development has been [[parse reranking]] in which the parser proposes some large number of analyses, and a more complex system selects the best option.{{citation needed|date=January 2019}} In [[natural language understanding]] applications, [[semantic parsing|semantic parser]]s convert the text into a representation of its meaning.<ref name=":0">{{cite arXiv|last1=Jia|first1=Robin|last2=Liang|first2=Percy|date=2016-06-11|title=Data Recombination for Neural Semantic Parsing|eprint=1606.03622 |class=cs.CL}}</ref>

=== Psycholinguistics ===
In [[psycholinguistics]], parsing involves not just the assignment of words to categories (formation of ontological insights), but the evaluation of the meaning of a sentence according to the rules of syntax drawn by inferences made from each word in the sentence (known as [[connotation]]). This normally occurs as words are being heard or read.

Neurolinguistics generally understands parsing to be a function of working memory, meaning that parsing is used to keep several parts of one sentence at play in the mind at one time, all readily accessible to be analyzed as needed. Because the human working memory has limitations, so does the function of sentence parsing.<ref>Sandra H. Vos, Thomas C. Gunter, Herbert Schriefers & Angela D. Friederici (2001) Syntactic parsing and working memory: The effects of syntactic complexity, reading span, and concurrent load, Language and Cognitive Processes, 16:1, 65-103, DOI: 10.1080/01690960042000085</ref> This is evidenced by several different types of syntactically complex sentences that demonstrate potential issues for mental parsing of sentences.

The first, and perhaps most well-known, type of sentence that challenges parsing ability is the garden-path sentence. These sentences are designed so that the most common interpretation of the sentence appears grammatically faulty, but upon further inspection, these sentences are grammatically sound. Garden-path sentences are difficult to parse because they contain a phrase or a word with more than one meaning, often their most typical meaning being a different part of speech.<ref name="doi.org">Pritchett, B. L. (1988). Garden Path Phenomena and the Grammatical Basis of Language Processing. Language, 64(3), 539–576. https://doi.org/10.2307/414532</ref> For example, in the sentence, "the horse raced past the barn fell", raced is initially interpreted as a past tense verb, but in this sentence, it functions as part of an adjective phrase.<ref>{{cite book|author=Thomas G Bever |title= The cognitive basis for linguistic structures |date= 1970 | oclc= 43300456}}</ref> Since parsing is used to identify parts of speech, these sentences challenge the parsing ability of the reader.

Another type of sentence that is difficult to parse is an attachment ambiguity, which includes a phrase that could potentially modify different parts of a sentence, and therefore presents a challenge in identifying syntactic relationship (i.e. "The boy saw the lady with the telescope", in which the ambiguous phrase with the telescope could modify the boy saw or the lady.) <ref name="doi.org"/>

A third type of sentence that challenges parsing ability is center embedding, in which phrases are placed in the center of other similarly formed phrases (i.e. "The rat the cat the man hit chased ran into the trap".) Sentences with 2 or in the most extreme cases 3 center embeddings are challenging for mental parsing, again because of ambiguity of syntactic relationship.<ref>Karlsson, F. (2010). Working Memory Constraints on Multiple Center-Embedding. Proceedings of the Annual Meeting of the Cognitive Science Society, 32. Retrieved from https://escholarship.org/uc/item/4j00v1j2</ref>

Within neurolinguistics there are multiple theories that aim to describe how parsing takes place in the brain. One such model is a more traditional generative model of sentence processing, which theorizes that within the brain there is a distinct module designed for sentence parsing, which is preceded by access to lexical recognition and retrieval, and then followed by syntactic processing that considers a single syntactic result of the parsing, only returning to revise that syntactic interpretation if a potential problem is detected.<ref>Ferreira, F., & Clifton, C. (1986). The independence of syntactic processing. Journal of Memory and Language, 25(3), 348–368. https://doi.org/10.1016/0749-596X(86)90006-9</ref> The opposing, more contemporary model theorizes that within the mind, the processing of a sentence is not modular, or happening in strict sequence. Rather, it poses that several different syntactic possibilities can be considered at the same time, because lexical access, syntactic processing, and determination of meaning occur in parallel in the brain. In this way these processes are integrated.<ref>Atlas, J. D. (1997). On the modularity of sentence processing: semantical generality and the language of thought. Language and Conceptualization, 213–214.</ref>

Although there is still much to learn about the neurology of parsing, studies have shown evidence that several areas of the brain might play a role in parsing. These include the left anterior temporal pole, the left inferior frontal gyrus, the left superior temporal gyrus, the left superior frontal gyrus, the right posterior cingulate cortex, and the left angular gyrus. Although it has not been absolutely proven, it has been suggested that these different structures might favor either phrase-structure parsing or dependency-structure parsing, meaning different types of parsing could be processed in different ways which have yet to be understood.<ref>Lopopolo, Alessandro, van den Bosch, Antal, Petersson, Karl-Magnus, and Roel M. Willems; Distinguishing Syntactic Operations in the Brain: Dependency and Phrase-Structure Parsing. Neurobiology of Language 2021; 2 (1): 152–175. doi: https://doi.org/10.1162/nol_a_00029</ref>

=== Discourse analysis ===
[[Discourse analysis]] examines ways to analyze language use and semiotic events. Persuasive language may be called [[rhetoric]].