Editing Natural language processing (section)

==Approaches: Symbolic, statistical, neural networks{{anchor|Statistical natural language processing (SNLP)}} ==
Symbolic approach, i.e., the hand-coding of a set of rules for manipulating symbols, coupled with a dictionary lookup, was historically the first approach used both by AI in general and by NLP in particular:<ref name=winograd:shrdlu71>{{cite thesis |last=Winograd |first=Terry |year=1971 |title=Procedures as a Representation for Data in a Computer Program for Understanding Natural Language |url=http://hci.stanford.edu/winograd/shrdlu/ }}</ref><ref name=schank77>{{cite book |first1=Roger C. |last1=Schank |first2=Robert P. |last2=Abelson |year=1977 |title=Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures |location=Hillsdale |publisher=Erlbaum |isbn=0-470-99033-3 }}</ref> such as by writing grammars or devising heuristic rules for [[stemming]].

[[Machine learning]] approaches, which include both statistical and neural networks, on the other hand, have many advantages over the symbolic approach: 

* both statistical and neural networks methods can focus more on the most common cases extracted from a corpus of texts, whereas the rule-based approach needs to provide rules for both rare cases and common ones equally.

* [[language model]]s, produced by either statistical or neural networks methods, are more robust to both unfamiliar (e.g. containing words or structures that have not been seen before) and erroneous input (e.g. with misspelled words or words accidentally omitted) in comparison to the rule-based systems, which are also more costly to produce.

* the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to [[intractable problem|intractability]] problems.

Rule-based systems are commonly used:

* when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the [[Apertium]] system,
* for preprocessing in NLP pipelines, e.g., [[Tokenization (lexical analysis)|tokenization]], or
* for postprocessing and transforming the output of NLP pipelines, e.g., for [[knowledge extraction]] from syntactic parses.

=== Statistical approach ===
In the late 1980s and mid-1990s, the statistical approach ended a period of [[AI winter]], which was caused by the inefficiencies of the rule-based approaches.<ref name="johnson:eacl:ilcl09">[http://www.aclweb.org/anthology/W09-0103 Mark Johnson. How the statistical revolution changes (computational) linguistics.] Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics.</ref><ref name="resnik:langlog11">[http://languagelog.ldc.upenn.edu/nll/?p=2946 Philip Resnik. Four revolutions.] Language Log, February 5, 2011.</ref>

The earliest [[decision tree]]s, producing systems of hard [[Conditional (computer programming)#If–then(–else)|if–then rules]], were still very similar to the old rule-based approaches.
Only the introduction of hidden [[Markov model]]s, applied to part-of-speech tagging, announced the end of the old rule-based approach.

=== Neural networks ===
{{Further|Artificial neural network}}
A major drawback of statistical methods is that they require elaborate [[feature engineering]]. Since 2015,<ref>{{Cite web |last=Socher |first=Richard |title=Deep Learning For NLP-ACL 2012 Tutorial |url=https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial |access-date=2020-08-17 |website=www.socher.org}} This was an early Deep Learning tutorial at the ACL 2012 and met with both interest and (at the time) skepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP. [Link is broken, try http://web.stanford.edu/class/cs224n/]</ref> the statistical approach has been replaced by the [[Artificial neural network|neural networks]] approach, using [[semantic networks]]<ref>{{cite book |last1=Segev |first1=Elad |title=Semantic Network Analysis in Social Sciences |date=2022 |publisher=Routledge |location=London |isbn=9780367636524 |url=https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 |access-date=5 December 2021 |archive-date=5 December 2021 |archive-url=https://web.archive.org/web/20211205140726/https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 |url-status=live }}</ref> and [[word embedding]]s to capture semantic properties of words.  

Intermediate tasks (e.g., part-of-speech tagging and dependency parsing) are not needed anymore. 

[[Neural machine translation]], based on then-newly invented [[Seq2seq|sequence-to-sequence]] transformations, made obsolete the intermediate steps, such as word alignment, previously necessary for [[statistical machine translation]].