Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Document classification
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Automatic document classification (ADC)== Automatic document classification tasks can be divided into three sorts: '''supervised document classification''' where some external mechanism (such as human feedback) provides information on the correct classification for documents, '''unsupervised document classification''' (also known as [[document clustering]]), where the classification must be done entirely without reference to external information, and '''semi-supervised document classification''',<ref> Rossi, R. G., Lopes, A. d. A., and Rezende, S. O. (2016). [https://www.sciencedirect.com/science/article/pii/S0306457315000990 Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts]. Information Processing & Management, 52(2):217β257. </ref> where parts of the documents are labeled by the external mechanism. There are several software products under various license models available.<ref>{{Cite web |url=https://pdfs.semanticscholar.org/bea4/a204239556a29228decc9e029c326e4900b7.pdf |title=An Interactive Automatic Document Classification Prototype |access-date=2017-11-14 |archive-url=https://web.archive.org/web/20171115082749/https://pdfs.semanticscholar.org/bea4/a204239556a29228decc9e029c326e4900b7.pdf |archive-date=2017-11-15 |url-status=dead }}</ref><ref>[https://seer.lcc.ufmg.br/index.php/jidm/article/download/43/41An Interactive Automatic Document Classification Prototype] {{webarchive |url=https://web.archive.org/web/20150424122349/https://seer.lcc.ufmg.br/index.php/jidm/article/download/43/41An |date=April 24, 2015 }}</ref><ref>[https://archive.today/20141208063727/http://www.artsyltech.com/da_classification.htmlAutomatic Document Classification - Artsyl]</ref><ref>[http://www.abbyy.com/ocr_sdk_windows/what_is_new/classification/ ABBYY FineReader Engine 11 for Windows]</ref><ref>[http://www.antidot.net/classifier/ Classifier - Antidot]</ref><ref>{{Cite web|title=3 Document Classification Methods for Tough Projects|url=https://www.bisok.com/grooper-data-capture-method-features/document-classification/|access-date=2021-08-04|website=www.bisok.com|language=en-US}}</ref> === Techniques === Automatic document classification techniques include: * [[Artificial neural network]] * [[Concept Mining]] * [[Decision tree learning|Decision trees]] such as [[ID3 algorithm|ID3]] or [[C4.5 algorithm|C4.5]] * [[Expectation maximization]] (EM) * [[Instantaneously trained neural networks]] * [[Latent semantic indexing]] * [[Multiple-instance learning]] * [[Naive Bayes classifier]] * [[Natural language processing]] approaches * [[Rough set]]-based classifier * [[Soft set]]-based classifier * [[Support vector machines]] (SVM) * [[k-nearest neighbor algorithm|K-nearest neighbour algorithms]] * [[tfβidf]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)