Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Association rule learning
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Statistically sound associations == One limitation of the standard approach to discovering associations is that by searching massive numbers of possible associations to look for collections of items that appear to be associated, there is a large risk of finding many spurious associations. These are collections of items that co-occur with unexpected frequency in the data, but only do so by chance. For example, suppose we are considering a collection of 10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if there is no association. If we assume there are no associations, we should nonetheless expect to find 50,000,000,000 rules. Statistically sound association discovery<ref>{{cite journal |doi=10.1007/s10994-007-5006-x |title=Discovering Significant Patterns |journal=Machine Learning |volume=68 |pages=1–33 |year=2007 |last1=Webb |first1=Geoffrey I. |doi-access=free }}</ref><ref>{{cite journal |doi=10.1145/1297332.1297338 |title=Assessing data mining results via swap randomization |journal=ACM Transactions on Knowledge Discovery from Data |volume=1 |issue=3 |pages=14–es |year=2007 |last1=Gionis |first1=Aristides |last2=Mannila |first2=Heikki |last3=Mielikäinen |first3=Taneli |last4=Tsaparas |first4=Panayiotis |citeseerx=10.1.1.141.2607 |s2cid=52305658 }}</ref> controls this risk, in most cases reducing the risk of finding ''any'' spurious associations to a user-specified significance level.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)