Editing Sampling bias (section)

==Problems due to sampling bias==

Sampling bias is problematic because it is possible that a [[statistic]] computed of the sample is systematically erroneous. Sampling bias can lead to a systematic over- or under-estimation of the corresponding [[parameter]] in the population. Sampling bias occurs in practice as it is practically impossible to ensure perfect randomness in sampling. If the degree of misrepresentation is small, then the sample can be treated as a reasonable approximation to a random sample. Also, if the sample does not differ markedly in the quantity being measured, then a biased sample can still be a reasonable estimate.

The word [[bias]] has a strong negative connotation. Indeed, biases sometimes come from deliberate intent to mislead or other [[scientific fraud]]. In statistical usage, bias merely represents a mathematical property, no matter if it is deliberate or unconscious or due to imperfections in the instruments used for observation. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample, or ignorance of the bias in their process of measurement or analysis.  An example of how ignorance of a bias can exist is in the widespread use of a ratio (a.k.a. [[fold change]]) as a measure of difference in biology.  Because it is easier to achieve a large ratio with two small numbers with a given difference, and relatively more difficult to achieve a large ratio with two large numbers with a larger difference, large significant differences may be missed when comparing relatively large numeric measurements.  Some have called this a 'demarcation bias' because the use of a ratio (division) instead of a difference (subtraction) removes the results of the analysis from science into pseudoscience (See [[Demarcation Problem]]).

Some samples use a biased statistical design which nevertheless allows the estimation of parameters. The U.S. [[National Center for Health Statistics]], for example, deliberately oversamples from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups.<ref>{{cite web | url = https://www.cdc.gov/nchs/about/otheract/minority/minority.htm | publisher = National Center for Health Statistics | date = 2007 | title = Minority Health }}</ref> These surveys require the use of sample weights (see later on) to produce proper estimates across all ethnic groups. Provided that certain conditions are met (chiefly that the weights are calculated and used correctly) these samples permit accurate estimation of population parameters.