Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Text mining
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Text analysis processes == Subtasks—components of a larger text-analytics effort—typically include: * [[Dimensionality reduction]] is an important technique for pre-processing data. It is used to identify the root word for actual words and reduce the size of the text data.{{citation needed|date=October 2022}} * [[Information retrieval]] or identification of a [[text corpus|corpus]] is a preparatory step: collecting or identifying a set of textual materials, on the Web or held in a [[file system]], [[database]], or content [[corpus manager]], for analysis. * Although some text analytics systems apply exclusively advanced statistical methods, many others apply more extensive [[natural language processing]], such as [[part of speech tagging]], syntactic [[parsing]], and other types of linguistic analysis.<ref>{{Cite thesis|title=Exploração de informações contextuais para enriquecimento semântico em representações de textos|url=http://www.teses.usp.br/teses/disponiveis/55/55134/tde-03012019-103253/|publisher=Universidade de São Paulo|date=2018-11-14|place=São Carlos|degree=Mestrado em Ciências de Computação e Matemática Computacional|doi=10.11606/d.55.2019.tde-03012019-103253|language=pt|first=João|last=Antunes|doi-access=free}}</ref> * [[Named entity recognition]] is the use of gazetteers or statistical techniques to identify named text features: people, organizations, place names, stock ticker symbols, certain abbreviations, and so on. * Disambiguation—the use of [[context (language use)|contextual]] clues—may be required to decide where, for instance, "Ford" can refer to a former U.S. president, a vehicle manufacturer, a movie star, a river crossing, or some other entity.<ref>{{Cite journal|last1=Moro|first1=Andrea|last2=Raganato|first2=Alessandro|last3=Navigli|first3=Roberto|date=December 2014|title=Entity Linking meets Word Sense Disambiguation: a Unified Approach|journal=Transactions of the Association for Computational Linguistics|volume=2|pages=231–244|doi=10.1162/tacl_a_00179|issn=2307-387X|doi-access=free}}</ref> * Recognition of pattern-identified entities: Features such as telephone numbers, e-mail addresses, quantities (with units) can be discerned via regular expression or other [[Pattern matching|pattern matches]]. *[[Document clustering]]: identification of sets of similar text documents.<ref>{{Cite journal|last1=Chang|first1=Wui Lee|last2=Tay|first2=Kai Meng|last3=Lim|first3=Chee Peng|date=2017-02-06|title=A New Evolving Tree-Based Model with Local Re-learning for Document Clustering and Visualization|journal=Neural Processing Letters|volume=46|issue=2|pages=379–409|doi=10.1007/s11063-017-9597-3|s2cid=9100902|issn=1370-4621}}</ref> * [[Coreference]] resolution: identification of [[noun phrase]]s and other terms that refer to the same object. * Extraction of relationships, facts and events: identification of associations among entities and other information in texts. * [[Sentiment analysis]]: discerning of subjective material and extracting information about attitudes: sentiment, opinion, mood, and emotion. This is done at the entity, concept, or topic level and aims to distinguish opinion holders and objects.<ref>{{cite journal |last1=Benchimol |first1=Jonathan |last2=Kazinnik |first2=Sophia |last3=Saadon |first3=Yossi |date=2022 |title=Text mining methodologies with R: An application to central bank texts |url=https://paperswithcode.com/paper/text-mining-methodologies-with-r-an |journal=Machine Learning with Applications |volume=8 |pages=100286 |doi=10.1016/j.mlwa.2022.100286|s2cid=243798160 |doi-access=free }}</ref> * Quantitative text analysis: a set of techniques stemming from the social sciences where either a human judge or a computer extracts semantic or grammatical relationships between words in order to find out the meaning or stylistic patterns of, usually, a casual personal text for the purpose of [[psychological profiling]] etc.<ref>{{cite book|doi=10.1037/11383-011 |title=Handbook of multimethod measurement in psychology |year=2006 |last1=Mehl |first1=Matthias R. |isbn=978-1-59147-318-3 |page=141|chapter=Quantitative Text Analysis }}</ref> * Pre-processing usually involves tasks such as tokenization, filtering and stemming.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)