Editing Natural language processing (section)

== Common NLP tasks ==
The following is a list of some of the most commonly researched tasks in natural language processing. Some of these tasks have direct real-world applications, while others more commonly serve as subtasks that are used to aid in solving larger tasks.

Though natural language processing tasks are closely intertwined, they can be subdivided into categories for convenience. A coarse division is given below.

=== Text and speech processing ===
; [[Optical character recognition]] (OCR)
:Given an image representing printed text, determine the corresponding text.

; [[Speech recognition]]: Given a sound clip of a person or people speaking, determine the textual representation of the speech.  This is the opposite of [[text to speech]] and is one of the extremely difficult problems colloquially termed "[[AI-complete]]" (see above).  In [[natural speech]] there are hardly any pauses between successive words, and thus [[speech segmentation]] is a necessary subtask of speech recognition (see below). In most spoken languages, the sounds representing successive letters blend into each other in a process termed [[coarticulation]], so the conversion of the [[analog signal]] to discrete characters can be a very difficult process. Also, given that words in the same language are spoken by people with different accents, the speech recognition software must be able to recognize the wide variety of input as being identical to each other in terms of its textual equivalent.
; [[Speech segmentation]]: Given a sound clip of a person or people speaking, separate it into words.  A subtask of [[speech recognition]] and typically grouped with it.

; [[Text-to-speech]]
:Given a text, transform those units and produce a spoken representation. Text-to-speech can be used to aid the visually impaired.<ref>{{Citation|last1=Yi|first1=Chucai|title=Assistive Text Reading from Complex Background for Blind Persons|date=2012|work=Camera-Based Document Analysis and Recognition|pages=15–28|publisher=Springer Berlin Heidelberg|language=en|citeseerx=10.1.1.668.869|doi=10.1007/978-3-642-29364-1_2|isbn=9783642293634|last2=Tian|first2=Yingli|author2-link=Yingli Tian|series=Lecture Notes in Computer Science |volume=7139 }}</ref>

; [[Word segmentation]] ([[Tokenization (lexical analysis)|Tokenization]]) 
:Tokenization is a process used in text analysis that divides text into individual words or word fragments. This technique results in two key components: a word index and tokenized text. The word index is a list that maps unique words to specific numerical identifiers, and the tokenized text replaces each word with its corresponding numerical token. These numerical tokens are then used in various deep learning methods.<ref name=":0" />
:For a language like [[English language|English]], this is fairly trivial, since words are usually separated by spaces. However, some written languages like [[Chinese language|Chinese]], [[Japanese language|Japanese]] and [[Thai language|Thai]] do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the [[vocabulary]] and [[Morphology (linguistics)|morphology]] of words in the language. Sometimes this process is also used in cases like [[bag of words]] (BOW) creation in data mining.{{Citation needed|date=May 2024}}

=== Morphological analysis ===
; [[Lemmatisation|Lemmatization]]: The task of removing inflectional endings only and to return the base dictionary form of a word which is also known as a lemma. Lemmatization is another technique for reducing words to their normalized form. But in this case, the transformation actually uses a dictionary to map words to their actual form.<ref>{{Cite web|date=2020-12-06|title=What is Natural Language Processing? Intro to NLP in Machine Learning|url=https://www.gyansetu.in/what-is-natural-language-processing/|access-date=2021-01-09|website=GyanSetu!|language=en-US}}</ref>
; [[Morphology (linguistics)|Morphological segmentation]]: Separate words into individual [[morpheme]]s and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the [[Morphology (linguistics)|morphology]] (''i.e.'', the structure of words) of the language being considered. [[English language|English]] has fairly simple morphology, especially [[inflectional morphology]], and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g., "open, opens, opened, opening") as separate words. In languages such as [[Turkish language|Turkish]] or [[Meitei language|Meitei]], a highly [[Agglutination|agglutinated]] Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.<ref>{{cite journal |last1=Kishorjit |first1=N. |last2=Vidya |first2=Raj RK. |last3=Nirmal |first3=Y. |last4=Sivaji |first4=B. |year=2012 |url=http://aclweb.org/anthology//W/W12/W12-5008.pdf |title=Manipuri Morpheme Identification |journal=Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP) |pages=95–108 |location=COLING 2012, Mumbai, December 2012 }}</ref>
; [[Part-of-speech tagging]]: Given a sentence, determine the [[part of speech]] (POS) for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a [[noun]] ("the book on the table") or [[verb]] ("to book a flight"); "set" can be a noun, verb or [[adjective]]; and "out" can be any of at least five different parts of speech.

; [[Stemming]]
:The process of reducing inflected (or sometimes derived) words to a base form (e.g., "close" will be the root for "closed", "closing", "close", "closer" etc.). Stemming yields similar results as lemmatization, but does so on grounds of rules, not a dictionary.

=== Syntactic analysis ===
{{Formal languages}}
; [[Grammar induction]]<ref>{{cite journal|last1=Klein|first1=Dan|last2=Manning|first2=Christopher D.|year=2002|title=Natural language grammar induction using a constituent-context model|url=http://papers.nips.cc/paper/1945-natural-language-grammar-induction-using-a-constituent-context-model.pdf|journal=Advances in Neural Information Processing Systems}}</ref>
: Generate a [[formal grammar]] that describes a language's syntax.
; [[Sentence breaking]] (also known as "[[sentence boundary disambiguation]]")
: Given a chunk of text, find the sentence boundaries. Sentence boundaries are often marked by [[Full stop|periods]] or other [[punctuation mark]]s, but these same characters can serve other purposes (e.g., marking [[abbreviation]]s).
; [[Parsing]]: Determine the [[parse tree]] (grammatical analysis) of a given sentence. The [[grammar]] for [[natural language]]s is [[ambiguous]] and typical sentences have multiple possible analyses: perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human). There are two primary types of parsing: ''dependency parsing'' and ''constituency parsing''. Dependency parsing focuses on the relationships between words in a sentence (marking things like primary objects and predicates), whereas constituency parsing focuses on building out the parse tree using a [[probabilistic context-free grammar]] (PCFG) (see also ''[[stochastic grammar]]'').

=== Lexical semantics (of individual words in context) ===
; [[Lexical semantics]]: What is the computational meaning of individual words in context?
; [[Distributional semantics]]: How can we learn semantic representations from data?
; [[Named entity recognition]] (NER): Given a stream of text, determine which items in the text map to proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization). Although [[capitalization]] can aid in recognizing named entities in languages such as English, this information cannot aid in determining the type of [[named entity]], and in any case, is often inaccurate or insufficient.  For example, the first letter of a sentence is also capitalized, and named entities often span several words, only some of which are capitalized.  Furthermore, many other languages in non-Western scripts (e.g. [[Chinese language|Chinese]] or [[Arabic language|Arabic]]) do not have any capitalization at all, and even languages with capitalization may not consistently use it to distinguish names. For example, [[German language|German]] capitalizes all [[noun]]s, regardless of whether they are names, and [[French language|French]] and [[Spanish language|Spanish]] do not capitalize names that serve as [[adjective]]s. Another name for this task is token classification.<ref>{{Cite journal |last1=Kariampuzha |first1=William |last2=Alyea |first2=Gioconda |last3=Qu |first3=Sue |last4=Sanjak |first4= Jaleal |last5=Mathé |first5=Ewy |last6=Sid |first6=Eric |last7= Chatelaine |first7=Haley |last8=Yadaw |first8=Arjun |last9=Xu |first9=Yanji |last10=Zhu |first10=Qian |date=2023 |title=Precision information extraction for rare disease epidemiology at scale |journal=Journal of Translational Medicine |language=en |volume=21 |issue=1 |page=157 |doi=10.1186/s12967-023-04011-y |pmid=36855134 |pmc=9972634 |doi-access=free }}</ref>

; [[Sentiment analysis]] (see also [[Multimodal sentiment analysis]])
: Sentiment analysis is a computational method used to identify and classify the emotional intent behind text. This technique involves analyzing text to determine whether the expressed sentiment is positive, negative, or neutral. Models for sentiment classification typically utilize inputs such as [[Word n-gram language model|word n-grams]], [[Term frequency-inverse document frequency|Term Frequency-Inverse Document Frequency]] (TF-IDF) features, hand-generated features, or employ [[deep learning]] models designed to recognize both long-term and short-term dependencies in text sequences. The applications of sentiment analysis are diverse, extending to tasks such as categorizing customer reviews on various online platforms.<ref name=":0">{{Cite web |date=2023-01-11 |title=Natural Language Processing (NLP) - A Complete Guide |url=https://www.deeplearning.ai/resources/natural-language-processing/ |access-date=2024-05-05 |website=www.deeplearning.ai |language=en}}</ref>
; [[Terminology extraction]]
:The goal of terminology extraction is to automatically extract relevant terms from a given corpus.
; [[Word-sense disambiguation]] (WSD): Many words have more than one [[Meaning (linguistics)|meaning]]; we have to select the meaning which makes the most sense in context.  For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or an online resource such as [[WordNet]].
; [[Entity linking]]: Many words—typically proper names—refer to [[Named entity|named entities]]; here we have to select the entity (a famous individual, a location, a company, etc.) which is referred to in context.

=== Relational semantics (semantics of individual sentences) ===
; [[Relationship extraction]]: Given a chunk of text, identify the relationships among named entities (e.g. who is married to whom).
; [[Semantic parsing]]: Given a piece of text (typically a sentence), produce a formal representation of its semantics, either as a graph (e.g., in [[Abstract Meaning Representation|AMR parsing]]) or in accordance with a logical formalism (e.g., in [[Discourse representation theory|DRT parsing]]). This challenge typically includes aspects of several more elementary NLP tasks from semantics (e.g., semantic role labelling, word-sense disambiguation) and can be extended to include full-fledged discourse analysis (e.g., discourse analysis, coreference; see [[#Natural language understanding|Natural language understanding]] below).
; [[Semantic role labeling|Semantic role labelling]] (see also implicit semantic role labelling below)
:Given a single sentence, identify and disambiguate semantic predicates (e.g., verbal [[Frame semantics (linguistics)|frames]]), then identify and classify the frame elements ([[semantic roles]]).

=== Discourse (semantics beyond individual sentences) ===
; [[Coreference|Coreference resolution]]: Given a sentence or larger chunk of text, determine which words ("mentions") refer to the same objects ("entities"). [[Anaphora resolution]] is a specific example of this task, and is specifically concerned with matching up [[pronoun]]s with the nouns or names to which they refer. The more general task of coreference resolution also includes identifying so-called "bridging relationships" involving [[referring expression]]s. For example, in a sentence such as "He entered John's house through the front door", "the front door" is a referring expression and the bridging relationship to be identified is the fact that the door being referred to is the front door of John's house (rather than of some other structure that might also be referred to).
; [[Discourse analysis]]: This rubric includes several related tasks.  One task is discourse parsing, i.e., identifying the [[discourse]] structure of a connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast).  Another possible task is recognizing and classifying the [[speech act]]s in a chunk of text (e.g. yes–no question, content question, statement, assertion, etc.).

; {{visible anchor|Implicit semantic role labelling}}
:Given a single sentence, identify and disambiguate semantic predicates (e.g., verbal [[Frame semantics (linguistics)|frames]]) and their explicit semantic roles in the current sentence (see [[#Semantic role labelling|Semantic role labelling]] above). Then, identify semantic roles that are not explicitly realized in the current sentence, classify them into arguments that are explicitly realized elsewhere in the text and those that are not specified, and resolve the former against the local text. A closely related task is zero anaphora resolution, i.e., the extension of coreference resolution to [[pro-drop language]]s.

; [[Textual entailment|Recognizing textual entailment]]: Given two text fragments, determine if one being true entails the other, entails the other's negation, or allows the other to be either true or false.<ref name="rte:11">PASCAL Recognizing Textual Entailment Challenge (RTE-7) https://tac.nist.gov//2011/RTE/</ref>

; [[Topic segmentation]] and recognition
:Given a chunk of text, separate it into segments each of which is devoted to a topic, and identify the topic of the segment.

; [[Argument mining]]
:The goal of argument mining is the automatic extraction and identification of argumentative structures from [[natural language]] text with the aid of computer programs.<ref>{{Cite journal|last1=Lippi|first1=Marco|last2=Torroni|first2=Paolo|date=2016-04-20|title=Argumentation Mining: State of the Art and Emerging Trends|url=https://dl.acm.org/doi/10.1145/2850417|journal=ACM Transactions on Internet Technology|language=en|volume=16|issue=2|pages=1–25|doi=10.1145/2850417|hdl=11585/523460|s2cid=9561587|issn=1533-5399|hdl-access=free}}</ref> Such argumentative structures include the premise, conclusions, the [[argument scheme]] and the relationship between the main and subsidiary argument, or the main and counter-argument within discourse.<ref>{{Cite web|title=Argument Mining – IJCAI2016 Tutorial|url=https://www.i3s.unice.fr/~villata/tutorialIJCAI2016.html|access-date=2021-03-09|website=www.i3s.unice.fr}}</ref><ref>{{Cite web|title=NLP Approaches to Computational Argumentation – ACL 2016, Berlin|url=http://acl2016tutorial.arg.tech/|access-date=2021-03-09|language=en-GB}}</ref>

=== Higher-level NLP applications ===
; [[Automatic summarization]] (text summarization): Produce a readable summary of a chunk of text.  Often used to provide summaries of the text of a known type, such as research papers, articles in the financial section of a newspaper.
;{{visible anchor|Grammatical error correction}}
:Grammatical error detection and correction involves a great band-width of problems on all levels of linguistic analysis (phonology/orthography, morphology, syntax, semantics, pragmatics). Grammatical error correction is impactful since it affects hundreds of millions of people that use or acquire English as a second language. It has thus been subject to a number of shared tasks since 2011.<ref>{{Cite web|last=Administration|title=Centre for Language Technology (CLT)|url=https://www.mq.edu.au/research/research-centres-groups-and-facilities/innovative-technologies/centres/centre-for-language-technology-clt|access-date=2021-01-11|website=Macquarie University|language=en-au}}</ref><ref>{{Cite web|title=Shared Task: Grammatical Error Correction|url=https://www.comp.nus.edu.sg/~nlp/conll13st.html|access-date=2021-01-11|website=www.comp.nus.edu.sg}}</ref><ref>{{Cite web|title=Shared Task: Grammatical Error Correction|url=https://www.comp.nus.edu.sg/~nlp/conll14st.html|access-date=2021-01-11|website=www.comp.nus.edu.sg}}</ref> As far as orthography, morphology, syntax and certain aspects of semantics are concerned, and due to the development of powerful neural language models such as [[GPT-2]], this can now (2019) be considered a largely solved problem and is being marketed in various commercial applications.
;[[Logic translation]]
:Translate a text from a natural language into formal logic.
; [[Machine translation]] (MT)
:Automatically translate text from one human language to another.  This is one of the most difficult problems, and is a member of a class of problems colloquially termed "[[AI-complete]]", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) to solve properly.
; [[Natural-language understanding]] (NLU): Convert chunks of text into more formal representations such as [[first-order logic]] structures that are easier for [[computer]] programs to manipulate. Natural language understanding involves the identification of the intended semantic from the multiple possible semantics which can be derived from a natural language expression which usually takes the form of organized notations of natural language concepts. Introduction and creation of language metamodel and ontology are efficient however empirical solutions. An explicit formalization of natural language semantics without confusions with implicit assumptions such as [[closed-world assumption]] (CWA) vs. [[open-world assumption]], or subjective Yes/No vs. objective True/False is expected for the construction of a basis of semantics formalization.<ref>{{cite journal|last1=Duan|first1=Yucong|last2=Cruz|first2=Christophe|year=2011|title=Formalizing Semantic of Natural Language through Conceptualization from Existence|url=http://www.ijimt.org/abstract/100-E00187.htm|journal=International Journal of Innovation, Management and Technology|volume=2|issue=1|pages=37–42|archive-url=https://web.archive.org/web/20111009135952/http://www.ijimt.org/abstract/100-E00187.htm|archive-date=2011-10-09}}</ref>
; [[Natural language generation|Natural-language generation]]<nowiki> (NLG):</nowiki>
:Convert information from computer databases or semantic intents into readable human language.
; Book generation
:Not an NLP task proper but an extension of natural language generation and other NLP tasks is the creation of full-fledged books. The first machine-generated book was created by a rule-based system in 1984 (Racter, ''The policeman's beard is half-constructed'').<ref>{{Cite web|title=U B U W E B :: Racter|url=http://www.ubu.com/historical/racter/index.html|access-date=2020-08-17|website=www.ubu.com}}</ref> The first published work by a neural network was published in 2018, ''[[1 the Road]]'', marketed as a novel, contains sixty million words. Both these systems are basically elaborate but non-sensical (semantics-free) [[language model]]s. The first machine-generated science book was published in 2019 (Beta Writer, ''Lithium-Ion Batteries'', Springer, Cham).<ref>{{Cite book|last=Writer|first=Beta|date=2019|title=Lithium-Ion Batteries|language=en-gb|doi=10.1007/978-3-030-16800-1|isbn=978-3-030-16799-8|s2cid=155818532}}</ref> Unlike ''Racter'' and ''1 the Road'', this is grounded on factual knowledge and based on text summarization.
; [[Document AI]]
:A Document AI platform sits on top of the NLP technology enabling users with no prior experience of artificial intelligence, machine learning or NLP to quickly train a computer to extract the specific data they need from different document types. NLP-powered Document AI enables non-technical teams to quickly access information hidden in documents, for example, lawyers, business analysts and accountants.<ref>{{Cite web|title=Document Understanding AI on Google Cloud (Cloud Next '19) – YouTube|url=https://www.youtube.com/watch?v=7dtl650D0y0| archive-url=https://ghostarchive.org/varchive/youtube/20211030/7dtl650D0y0| archive-date=2021-10-30|access-date=2021-01-11|website=www.youtube.com| date=11 April 2019 }}{{cbignore}}</ref>
; [[Dialogue system|Dialogue management]]
:Computer systems intended to converse with a human.
; [[Question answering]]: Given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?").
; [[Text-to-image generation]]: Given a description of an image, generate an image that matches the description.<ref>{{Cite web |last=Robertson |first=Adi |date=2022-04-06 |title=OpenAI's DALL-E AI image generator can now edit pictures, too |url=https://www.theverge.com/2022/4/6/23012123/openai-clip-dalle-2-ai-text-to-image-generator-testing |access-date=2022-06-07 |website=The Verge |language=en}}</ref>
; Text-to-scene generation: Given a description of a scene, generate a [[3D model]] of the scene.<ref>{{Cite web |title=The Stanford Natural Language Processing Group |url=https://nlp.stanford.edu/projects/text2scene.shtml |access-date=2022-06-07 |website=nlp.stanford.edu}}</ref><ref>{{Cite book |last1=Coyne |first1=Bob |last2=Sproat |first2=Richard |title=Proceedings of the 28th annual conference on Computer graphics and interactive techniques |chapter=WordsEye |date=2001-08-01 |chapter-url=https://doi.org/10.1145/383259.383316 |series=SIGGRAPH '01 |location=New York, NY, USA |publisher=Association for Computing Machinery |pages=487–496 |doi=10.1145/383259.383316 |isbn=978-1-58113-374-5|s2cid=3842372 }}</ref>
; [[Text-to-video model|Text-to-video]]: Given a description of a video, generate a video that matches the description.<ref>{{Cite web |date=2022-11-02 |title=Google announces AI advances in text-to-video, language translation, more |url=https://venturebeat.com/ai/google-announces-ai-advances-in-text-to-video-language-translation-more/ |access-date=2022-11-09 |website=VentureBeat |language=en-US}}</ref><ref>{{Cite web |last=Vincent |first=James |date=2022-09-29 |title=Meta's new text-to-video AI generator is like DALL-E for video |url=https://www.theverge.com/2022/9/29/23378210/meta-text-to-video-ai-generation-make-a-video-model-dall-e |access-date=2022-11-09 |website=The Verge |language=en-US}}</ref>