Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Natural language processing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Morphological analysis === ; [[Lemmatisation|Lemmatization]]: The task of removing inflectional endings only and to return the base dictionary form of a word which is also known as a lemma. Lemmatization is another technique for reducing words to their normalized form. But in this case, the transformation actually uses a dictionary to map words to their actual form.<ref>{{Cite web|date=2020-12-06|title=What is Natural Language Processing? Intro to NLP in Machine Learning|url=https://www.gyansetu.in/what-is-natural-language-processing/|access-date=2021-01-09|website=GyanSetu!|language=en-US}}</ref> ; [[Morphology (linguistics)|Morphological segmentation]]: Separate words into individual [[morpheme]]s and identify the class of the morphemes. The difficulty of this task depends greatly on the complexity of the [[Morphology (linguistics)|morphology]] (''i.e.'', the structure of words) of the language being considered. [[English language|English]] has fairly simple morphology, especially [[inflectional morphology]], and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g., "open, opens, opened, opening") as separate words. In languages such as [[Turkish language|Turkish]] or [[Meitei language|Meitei]], a highly [[Agglutination|agglutinated]] Indian language, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.<ref>{{cite journal |last1=Kishorjit |first1=N. |last2=Vidya |first2=Raj RK. |last3=Nirmal |first3=Y. |last4=Sivaji |first4=B. |year=2012 |url=http://aclweb.org/anthology//W/W12/W12-5008.pdf |title=Manipuri Morpheme Identification |journal=Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP) |pages=95β108 |location=COLING 2012, Mumbai, December 2012 }}</ref> ; [[Part-of-speech tagging]]: Given a sentence, determine the [[part of speech]] (POS) for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a [[noun]] ("the book on the table") or [[verb]] ("to book a flight"); "set" can be a noun, verb or [[adjective]]; and "out" can be any of at least five different parts of speech. ; [[Stemming]] :The process of reducing inflected (or sometimes derived) words to a base form (e.g., "close" will be the root for "closed", "closing", "close", "closer" etc.). Stemming yields similar results as lemmatization, but does so on grounds of rules, not a dictionary.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)