Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sampling bias
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Types== * Selection from a '''specific real area'''. For example, a survey of high school students to measure teenage use of illegal drugs will be a biased sample because it does not include home-schooled students or dropouts. A sample is also biased if certain members are underrepresented or overrepresented relative to others in the population. For example, a "man on the street" interview which selects people who walk by a certain location is going to have an overrepresentation of healthy individuals who are more likely to be out of the home than individuals with a chronic illness. This may be an extreme form of biased sampling, because certain members of the population are totally excluded from the sample (that is, they have zero probability of being selected). * '''[[Self-selection bias|Self-selection]]''' bias (see also [[Non-response bias]]), which is possible whenever the group of people being studied has any form of control over whether to participate (as current standards of [[Research ethics|human-subject research ethics]] require for many real-time and some longitudinal forms of study). Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who do not. Another example is [[online and phone-in polls]], which are biased samples because the respondents are self-selected. Those individuals who are highly motivated to respond, typically individuals who have strong opinions, are overrepresented, and individuals that are indifferent or apathetic are less likely to respond. This often leads to a polarization of responses with extreme perspectives being given a disproportionate weight in the summary. As a result, these types of polls are regarded as unscientific. * '''Exclusion''' bias results from exclusion of particular groups from the sample, e.g. exclusion of subjects who have recently [[human migration|migrated]] into the study area (this may occur when newcomers are not available in a register used to identify the source population). Excluding subjects who move out of the study area during follow-up is rather equivalent of dropout or nonresponse, a [[selection bias]] in that it rather affects the internal validity of the study. * '''[[Healthy user bias]]''', when the study population is likely healthier than the general population. For example, someone in poor health is unlikely to have a job as manual laborer, so if a study is conducted on manual laborers, the health of the general population will likely be overestimated. * '''[[Berkson's fallacy]]''', when the study population is selected from a hospital and so is less healthy than the general population. This can result in a spurious negative correlation between diseases: a hospital patient without diabetes is ''more'' likely to have another given disease such as [[cholecystitis]], since they must have had some reason to enter the hospital in the first place. * '''[[Overmatching]]''', matching for an apparent [[Confounding|confounder]] that actually is a result of the exposure{{clarify|reason=Exposure to what? Exposure to the study or to the studied variable?|date=August 2014}}. The control group becomes more similar to the cases in regard to exposure than does the general population. * '''[[Survivorship bias]]''', in which only "surviving" subjects are selected, ignoring those that fell out of view. For example, using the record of current companies as an indicator of business climate or economy ignores the businesses that failed and no longer exist. * '''[[Malmquist bias]]''', an effect in observational astronomy which leads to the preferential detection of intrinsically bright objects. {{anchor|Spotlight fallacy}} * '''Spotlight fallacy''', the uncritical assumption that all members or cases of a certain class or type are like those that receive the most attention or coverage in the media. ===Symptom-based sampling=== The study of medical conditions begins with anecdotal reports. By their nature, such reports only include those referred for diagnosis and treatment. A child who can't function in school is more likely to be diagnosed with [[dyslexia]] than a child who struggles but passes. A child examined for one condition is more likely to be tested for and diagnosed with other conditions, skewing [[comorbidity]] statistics. As certain diagnoses become associated with behavior problems or [[intellectual disability]], parents try to prevent their children from being stigmatized with those diagnoses, introducing further bias. Studies carefully selected from whole populations are showing that many conditions are much more common and usually much milder than formerly believed. ===Truncate selection in pedigree studies=== [[File:Ascertainment bias.png|600px|center|thumbnail|Simple pedigree example of sampling bias]] Geneticists are limited in how they can obtain data from human populations. As an example, consider a human characteristic. We are interested in deciding if the characteristic is inherited as a [[autosomal recessive|simple Mendelian]] trait. Following the laws of [[Mendelian inheritance]], if the parents in a family do not have the characteristic, but carry the allele for it, they are carriers (e.g. a non-expressive [[heterozygote]]). In this case their children will each have a 25% chance of showing the characteristic. The problem arises because we can't tell which families have both parents as carriers (heterozygous) unless they have a child who exhibits the characteristic. The description follows the textbook by Sutton.<ref name="Sutton1988">{{cite book| vauthors = Sutton HE |title=An Introduction to Human Genetics|url=https://books.google.com/books?id=WY5qAAAAMAAJ|edition=4th|year=1988|publisher=Harcourt Brace Jovanovich|isbn=978-0-15-540099-3}}</ref> The figure shows the pedigrees of all the possible families with two children when the parents are carriers (Aa). * '''Nontruncate selection'''. In a perfect world we should be able to discover all such families with a gene including those who are simply carriers. In this situation the analysis would be free from ascertainment bias and the pedigrees would be under "nontruncate selection" In practice, most studies identify, and include, families in a study based upon them having affected individuals. * '''Truncate selection'''. When afflicted ''individuals'' have an equal chance of being included in a study this is called truncate selection, signifying the inadvertent exclusion (truncation) of families who are carriers for a gene. Because selection is performed on the individual level, families with two or more affected children would have a higher probability of becoming included in the study. * '''Complete truncate selection''' is a special case where each ''family'' with an affected child has an equal chance of being selected for the study. The probabilities of each of the families being selected is given in the figure, with the sample frequency of affected children also given. In this simple case, the researcher will look for a frequency of {{frac|4|7}} or {{frac|5|8}} for the characteristic, depending on the type of truncate selection used. ===The caveman effect=== An example of selection bias is called the "caveman effect". Much of our understanding of [[Prehistory|prehistoric]] peoples comes from caves, such as [[cave painting]]s made nearly 40,000 years ago. If there had been contemporary paintings on trees, animal skins or hillsides, they would have been washed away long ago. Similarly, evidence of fire pits, [[midden]]s, [[ceremonial burial|burial sites]], etc. are most likely to remain intact to the modern era in caves. Prehistoric people are associated with caves because that is where the data still exists, not necessarily because most of them lived in caves for most of their lives.<ref>{{cite journal |vauthors = Berk RA |title=An Introduction to Sample Selection Bias in Sociological Data |journal=American Sociological Review |date=June 1983 |volume=48 |issue=3 |pages=386β398 |doi=10.2307/2095230|jstor=2095230 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)