Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Text mining
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Process of analysing text to extract information from it}} '''Text mining''', '''text data mining''' ('''TDM''') or '''text analytics''' is the process of deriving high-quality [[information]] from [[plain text|text]]. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources."<ref>{{Cite web | url=http://people.ischool.berkeley.edu/~hearst/text-mining.html |title = Marti Hearst: What is Text Mining?}}</ref> Written resources may include [[website]]s, [[book]]s, [[email]]s, [[review]]s, and articles.<ref>{{cite journal | last1 = Galiani | first1 = Sebastian | last2 = Gálvez | first2 = Ramiro H. | last3 = Nachman | first3 = Ian | title = Specialization trends in economics research: A large-scale study using natural language processing and citation analysis | journal = Economic Inquiry | volume = 63 | issue = 1 | pages = 289–329 | year = 2025 | doi = 10.1111/ecin.13261 | url = https://onlinelibrary.wiley.com/doi/abs/10.1111/ecin.13261 }}</ref> High-quality information is typically obtained by devising patterns and trends by means such as [[pattern recognition|statistical pattern learning]]. According to Hotho et al. (2005), there are three perspectives of text mining: [[information extraction]], [[data mining]], and [[knowledge discovery in databases]] (KDD).<ref>Hotho, A., Nürnberger, A. and Paaß, G. (2005). "A brief survey of text mining". In Ldv Forum, Vol. 20(1), p. 19-62</ref> Text mining usually involves the process of structuring the input text (usually [[parsing]], along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a [[database]]), deriving patterns within the [[structured data]], and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of [[relevance (information retrieval)|relevance]], [[novelty (patent)|novelty]], and interest. Typical text mining tasks include [[text categorization]], [[text clustering]], concept/entity extraction, production of granular taxonomies, [[sentiment analysis]], [[document summarization]], and [[Entity–relationship model|entity relation modeling]] (''i.e.'', learning relations between [[named entity recognition|named entities]]). Text analysis involves [[information retrieval]], [[lexical analysis]] to study word frequency distributions, [[pattern recognition]], [[tag (metadata)|tagging]]/[[annotation]], [[information extraction]], [[data mining]] techniques including link and association analysis, [[information visualization|visualization]], and [[predictive analytics]]. The overarching goal is, essentially, to turn text into data for analysis, via the application of [[natural language processing]] (NLP), different types of [[algorithm]]s and analytical methods. An important phase of this process is the interpretation of the gathered information. A typical application is to scan a set of documents written in a [[natural language]] and either model the [[document]] set for [[predictive classification]] purposes or populate a database or search index with the information extracted. The [[document]] is the basic element when starting with text mining. Here, we define a document as a unit of textual data, which normally exists in many types of collections.<ref>Feldman, R. and Sanger, J. (2007). The text mining handbook. Cambridge University Press. New York</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)