Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Text mining
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Scientific literature mining and academic applications === The issue of text mining is of importance to publishers who hold large [[database]]s of information needing [[index (database)|indexing]] for retrieval. This is especially true in scientific disciplines, in which highly specific information is often contained within the written text. Therefore, initiatives have been taken such as [[Nature (journal)|Nature's]] proposal for an Open Text Mining Interface (OTMI) and the [[National Institutes of Health]]'s common Journal Publishing [[Document Type Definition]] (DTD) that would provide semantic cues to machines to answer specific queries contained within the text without removing publisher barriers to public access. Academic institutions have also become involved in the text mining initiative: * The [[National Centre for Text Mining]] (NaCTeM), is the first publicly funded text mining centre in the world. NaCTeM is operated by the [[University of Manchester]]<ref>{{cite web|url=http://www.manchester.ac.uk |title=The University of Manchester |publisher=Manchester.ac.uk |access-date=2015-02-23}}</ref> in close collaboration with the Tsujii Lab,<ref>{{cite web |url=http://www-tsujii.is.s.u-tokyo.ac.jp/index.html |title=Tsujii Laboratory |publisher=Tsujii.is.s.u-tokyo.ac.jp |access-date=2015-02-23 |archive-date=2012-03-07 |archive-url=https://web.archive.org/web/20120307231425/http://www-tsujii.is.s.u-tokyo.ac.jp/index.html |url-status=dead }}</ref> [[University of Tokyo]].<ref>{{cite web|url=http://www.u-tokyo.ac.jp/index_e.html |title=The University of Tokyo |publisher=UTokyo |access-date=2015-02-23}}</ref> NaCTeM provides customised tools, research facilities and offers advice to the academic community. They are funded by the [[Joint Information Systems Committee]] (JISC) and two of the UK [[research council (United Kingdom)|research councils]] ([[EPSRC]] & [[BBSRC]]). With an initial focus on text mining in the [[biology|biological]] and [[biomedical]] sciences, research has since expanded into the areas of [[social sciences]]. * In the United States, the [[UC Berkeley School of Information|School of Information]] at [[University of California, Berkeley]] is developing a program called BioText to assist [[biology]] researchers in text mining and analysis. * The [[Text Analysis Portal for Research]] (TAPoR), currently housed at the [[University of Alberta]], is a scholarly project to catalogue text analysis applications and create a gateway for researchers new to the practice. ==== Methods for scientific literature mining ==== Computational methods have been developed to assist with information retrieval from scientific literature. Published approaches include methods for searching,<ref>{{Cite book|last1=Shen|first1=Jiaming|last2=Xiao|first2=Jinfeng|last3=He|first3=Xinwei|last4=Shang|first4=Jingbo|last5=Sinha|first5=Saurabh|last6=Han|first6=Jiawei|date=2018-06-27|title=Entity Set Search of Scientific Literature: An Unsupervised Ranking Approach|publisher=ACM|pages=565β574|doi=10.1145/3209978.3210055|isbn=978-1-4503-5657-2|s2cid=13748283}}</ref> determining novelty,<ref>{{Cite journal|last1=Walter|first1=Lothar|last2=Radauer|first2=Alfred|last3=Moehrle|first3=Martin G.|date=2017-02-06|title=The beauty of brimstone butterfly: novelty of patents identified by near environment analysis based on text mining|journal=Scientometrics|volume=111|issue=1|pages=103β115|doi=10.1007/s11192-017-2267-4|s2cid=11174676|issn=0138-9130}}</ref> and clarifying [[homonym]]s<ref>{{Cite journal|last1=Roll|first1=Uri|last2=Correia|first2=Ricardo A.|last3=Berger-Tal|first3=Oded|date=2018-03-10|title=Using machine learning to disentangle homonyms in large text corpora|journal=Conservation Biology|volume=32|issue=3|pages=716β724|doi=10.1111/cobi.13044|pmid=29086438|bibcode=2018ConBi..32..716R |s2cid=3783779|issn=0888-8892}}</ref> among technical reports.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)