Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Text mining
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Digital humanities and computational sociology === The automatic analysis of vast textual corpora has created the possibility for scholars to analyze millions of documents in multiple languages with very limited manual intervention. Key enabling technologies have been parsing, [[machine translation]], topic [[categorization]], and machine learning. [[File:Tripletsnew2012.png|thumb|right|Narrative network of US Elections 2012<ref name="ReferenceA">Automated analysis of the US presidential elections using Big Data and network analysis; S Sudhahar, GA Veltri, N Cristianini; Big Data & Society 2 (1), 1-28, 2015</ref>]] The automatic parsing of textual corpora has enabled the extraction of actors and their relational networks on a vast scale, turning textual data into network data. The resulting networks, which can contain thousands of nodes, are then analyzed by using tools from network theory to identify the key actors, the key communities or parties, and general properties such as robustness or structural stability of the overall network, or centrality of certain nodes.<ref>Network analysis of narrative content in large corpora; S Sudhahar, G De Fazio, R Franzosi, N Cristianini; Natural Language Engineering, 1-32, 2013</ref> This automates the approach introduced by quantitative narrative analysis,<ref>Quantitative Narrative Analysis; Roberto Franzosi; Emory University Β© 2010</ref> whereby [[subject-verb-object]] triplets are identified with pairs of actors linked by an action, or pairs formed by actor-object.<ref name="ReferenceA" /> [[Content analysis]] has been a traditional part of social sciences and media studies for a long time. The automation of content analysis has allowed a "[[big data]]" revolution to take place in that field, with studies in social media and newspaper content that include millions of news items. [[Gender bias]], [[readability]], content similarity, reader preferences, and even mood have been analyzed based on text mining methods over millions of documents.<ref>{{Cite journal|last1=Lansdall-Welfare|first1=Thomas|last2=Sudhahar|first2=Saatviga|last3=Thompson|first3=James|last4=Lewis|first4=Justin|last5=Team|first5=FindMyPast Newspaper|last6=Cristianini|first6=Nello|date=2017-01-09|title=Content analysis of 150 years of British periodicals|journal=Proceedings of the National Academy of Sciences|volume=114|issue=4|pages=E457βE465|doi=10.1073/pnas.1606380114|issn=0027-8424|pmid=28069962|pmc=5278459|bibcode=2017PNAS..114E.457L |doi-access=free}}</ref><ref>I. Flaounas, M. Turchi, O. Ali, N. Fyson, T. De Bie, N. Mosdell, J. Lewis, N. Cristianini, The Structure of EU Mediasphere, PLoS ONE, Vol. 5(12), pp. e14243, 2010.</ref><ref>Nowcasting Events from the Social Web with Statistical Learning V Lampos, N Cristianini; ACM Transactions on Intelligent Systems and Technology (TIST) 3 (4), 72</ref><ref>NOAM: news outlets analysis and monitoring system; I Flaounas, O Ali, M Turchi, T Snowsill, F Nicart, T De Bie, N Cristianini Proc. of the 2011 ACM SIGMOD international conference on Management of data</ref><ref>Automatic discovery of patterns in media content, N Cristianini, Combinatorial Pattern Matching, 2-13, 2011</ref> The analysis of readability, gender bias and topic bias was demonstrated in Flaounas et al.<ref>I. Flaounas, O. Ali, T. Lansdall-Welfare, T. De Bie, N. Mosdell, J. Lewis, N. Cristianini, RESEARCH METHODS IN THE AGE OF DIGITAL JOURNALISM, Digital Journalism, Routledge, 2012</ref> showing how different topics have different gender biases and levels of readability; the possibility to detect mood patterns in a vast population by analyzing Twitter content was demonstrated as well.<ref>Circadian Mood Variations in Twitter Content; Fabon Dzogang, Stafford Lightman, Nello Cristianini. Brain and Neuroscience Advances, 1, 2398212817744501.</ref><ref>Effects of the Recession on Public Mood in the UK; T Lansdall-Welfare, V Lampos, N Cristianini; Mining Social Network Dynamics (MSND) session on Social Media Applications</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)