Editing Natural language processing (section)

=== Morphological analysis ===
; [[Lemmatisation|Lemmatization]]: The task of removing inflectional endings only and to return the base dictionary form of a word which is also known as a lemma. Lemmatization is another technique for reducing words to their normalized form. But in this case, the transformation actually uses a dictionary to map words to their actual form.<ref>{{Cite web|date=2020-12-06|title=What is Natural Language Processing? Intro to NLP in Machine Learning|url=https://www.gyansetu.in/what-is-natural-language-processing/|access-date=2021-01-09|website=GyanSetu!|language=en-US}}</ref>
; [[Morphology (linguistics)|Morphological segmentation]]: Separate words into individual [[morpheme]]s and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the [[Morphology (linguistics)|morphology]] (''i.e.'', the structure of words) of the language being considered. [[English language|English]] has fairly simple morphology, especially [[inflectional morphology]], and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g., "open, opens, opened, opening") as separate words. In languages such as [[Turkish language|Turkish]] or [[Meitei language|Meitei]], a highly [[Agglutination|agglutinated]] Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.<ref>{{cite journal |last1=Kishorjit |first1=N. |last2=Vidya |first2=Raj RK. |last3=Nirmal |first3=Y. |last4=Sivaji |first4=B. |year=2012 |url=http://aclweb.org/anthology//W/W12/W12-5008.pdf |title=Manipuri Morpheme Identification |journal=Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP) |pages=95–108 |location=COLING 2012, Mumbai, December 2012 }}</ref>
; [[Part-of-speech tagging]]: Given a sentence, determine the [[part of speech]] (POS) for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a [[noun]] ("the book on the table") or [[verb]] ("to book a flight"); "set" can be a noun, verb or [[adjective]]; and "out" can be any of at least five different parts of speech.

; [[Stemming]]
:The process of reducing inflected (or sometimes derived) words to a base form (e.g., "close" will be the root for "closed", "closing", "close", "closer" etc.). Stemming yields similar results as lemmatization, but does so on grounds of rules, not a dictionary.