Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Misuse of statistics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Data dredging=== {{Main|Data dredging}} [[Data dredging]] is an abuse of [[data mining]]. In data dredging, large compilations of data are examined in order to find a correlation, without any pre-defined choice of a [[hypothesis]] to be tested. Since the required [[confidence interval]] to establish a relationship between two parameters is usually chosen to be 95% (meaning that there is a 95% chance that the relationship observed is not due to random chance), there is thus a 5% chance of finding a correlation between any two sets of completely random variables. Given that data dredging efforts typically examine large datasets with many variables, and hence even larger numbers of pairs of variables, spurious but apparently statistically significant results are almost certain to be found by any such study. Note that data dredging is a valid way of ''finding'' a possible hypothesis but that hypothesis ''must'' then be tested with data not used in the original dredging. The misuse comes in when that hypothesis is stated as fact without further validation. "You cannot legitimately test a hypothesis on the same data that first suggested that hypothesis. The remedy is clear. Once you have a hypothesis, design a study to search specifically for the effect you now think is there. If the result of this test is statistically significant, you have real evidence at last."{{sfn|Moore|McCabe|2003|p=466}}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)