Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Misuse of statistics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Misreporting or misunderstanding of estimated error=== {{further|Opinion poll|Statistical survey}} If a research team wants to know how 300 million people feel about a certain topic, it would be impractical to ask all of them. However, if the team picks a random sample of about 1,000 people, they can be fairly certain that the results given by this group are representative of what the larger group would have said if they had all been asked. This confidence can actually be quantified by the [[central limit theorem]] and other mathematical results. Confidence is expressed as a probability of the true result (for the larger group) being within a certain range of the estimate (the figure for the smaller group). This is the "plus or minus" figure often quoted for statistical surveys. The probability part of the confidence level is usually not mentioned; if so, it is assumed to be a standard number like 95%. The two numbers are related. If a survey has an estimated error of ±5% at 95% confidence, it also has an estimated error of ±6.6% at 99% confidence. ±<math>x</math>% at 95% confidence is always ±<math>1.32x</math>% at 99% confidence for a normally distributed population. The smaller the estimated error, the larger the required sample, at a given confidence level; for example, at [[68–95–99.7 rule|95.4%]] confidence: * ±1% would require 10,000 people. * ±2% would require 2,500 people. * ±3% would require 1,111 people. * ±4% would require 625 people. * ±5% would require 400 people. * ±10% would require 100 people. * ±20% would require 25 people. * ±25% would require 16 people. * ±50% would require 4 people. People may assume, because the confidence figure is omitted, that there is a 100% certainty that the true result is within the estimated error. This is not mathematically correct. Many people may not realize that the randomness of the sample is very important. In practice, many opinion polls are conducted by phone, which distorts the sample in several ways, including exclusion of people who do not have phones, favoring the inclusion of people who have more than one phone, favoring the inclusion of people who are willing to participate in a phone survey over those who refuse, etc. Non-random sampling makes the estimated error unreliable. On the other hand, people may consider that statistics are inherently unreliable because not everybody is called, or because they themselves are never polled. People may think that it is impossible to get data on the opinion of dozens of millions of people by just polling a few thousands. This is also inaccurate.{{efn|Some data on accuracy of polls is available. Regarding one important poll by the U.S. government, "Relatively speaking, both [[sampling error]] and non-sampling [bias] error are tiny."{{sfn| Freedman | Pisani | Purves | 1998 |loc= chapter 22: Measuring Employment and Unemployment, p. 405}} The difference between the votes predicted by one private poll and the actually tally for American presidential elections is available for comparison at [http://www.presidency.ucsb.edu/data/preferences.php "Election Year Presidential Preferences: Gallup Poll Accuracy Record: 1936–2012"]. The predictions were typically calculated on the basis of less than 5,000 opinions by likely voters.{{sfn| Freedman | Pisani | Purves | 1998 |pp= 389–390}}}} A poll with perfect unbiased sampling and truthful answers has a mathematically determined [[margin of error]], which only depends on the number of people polled. However, often only one margin of error is reported for a survey. When results are reported for population subgroups, a larger margin of error will apply, but this may not be made clear. For example, a survey of 1,000 people may contain 100 people from a certain ethnic or economic group. The results focusing on that group will be much less reliable than results for the full population. If the margin of error for the full sample was 4%, say, then the margin of error for such a subgroup could be around 13%. There are also many other measurement problems in population surveys. The problems mentioned above apply to all statistical experiments, not just population surveys.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)