Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Machine learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== Anomaly detection ==== {{Main|Anomaly detection}} In [[data mining]], anomaly detection, also known as outlier detection, is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.<ref name="Zimek-2017">{{Citation|last1=Zimek|first1=Arthur|title=Outlier Detection|date=2017|encyclopedia=Encyclopedia of Database Systems|pages=1β5|publisher=Springer New York|language=en|doi=10.1007/978-1-4899-7993-3_80719-1|isbn=9781489979933|last2=Schubert|first2=Erich}}</ref> Typically, the anomalous items represent an issue such as [[bank fraud]], a structural defect, medical problems or errors in a text. Anomalies are referred to as [[outlier]]s, novelties, noise, deviations and exceptions.<ref>{{cite journal | last1 = Hodge | first1 = V. J. | last2 = Austin | first2 = J. | doi = 10.1007/s10462-004-4304-y | title = A Survey of Outlier Detection Methodologies | journal = Artificial Intelligence Review | volume = 22 | issue = 2 | pages = 85β126 | year = 2004 | url = http://eprints.whiterose.ac.uk/767/1/hodgevj4.pdf | citeseerx = 10.1.1.318.4023 | s2cid = 59941878 | access-date = 25 November 2018 | archive-date = 22 June 2015 | archive-url = https://web.archive.org/web/20150622042146/http://eprints.whiterose.ac.uk/767/1/hodgevj4.pdf | url-status = live }}</ref> In particular, in the context of abuse and network intrusion detection, the interesting objects are often not rare objects, but unexpected bursts of inactivity. This pattern does not adhere to the common statistical definition of an outlier as a rare object. Many outlier detection methods (in particular, unsupervised algorithms) will fail on such data unless aggregated appropriately. Instead, a cluster analysis algorithm may be able to detect the micro-clusters formed by these patterns.<ref>{{cite journal |first1=Paul |last1=Dokas |first2=Levent |last2=Ertoz |first3=Vipin |last3=Kumar |first4=Aleksandar |last4=Lazarevic |first5=Jaideep |last5=Srivastava |first6=Pang-Ning |last6=Tan |title=Data mining for network intrusion detection |year=2002 |journal=Proceedings NSF Workshop on Next Generation Data Mining |url=https://www-users.cse.umn.edu/~lazar027/MINDS/papers/nsf_ngdm_2002.pdf |access-date=26 March 2023 |archive-date=23 September 2015 |archive-url=https://web.archive.org/web/20150923211542/http://www.csee.umbc.edu/~kolari1/Mining/ngdm/dokas.pdf |url-status=live }}</ref> Three broad categories of anomaly detection techniques exist.<ref name="ChandolaSurvey">{{cite journal |last1=Chandola |first1=V. |last2=Banerjee |first2=A. |last3=Kumar |first3=V. |s2cid=207172599 |year=2009 |title=Anomaly detection: A survey|journal=[[ACM Computing Surveys]]|volume=41|issue=3|pages=1β58|doi=10.1145/1541880.1541882}}</ref> Unsupervised anomaly detection techniques detect anomalies in an unlabelled test data set under the assumption that the majority of the instances in the data set are normal, by looking for instances that seem to fit the least to the remainder of the data set. Supervised anomaly detection techniques require a data set that has been labelled as "normal" and "abnormal" and involves training a classifier (the key difference from many other statistical classification problems is the inherently unbalanced nature of outlier detection). Semi-supervised anomaly detection techniques construct a model representing normal behaviour from a given normal training data set and then test the likelihood of a test instance to be generated by the model.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)