Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sampling (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Stratified sampling=== {{main|Stratified sampling }} [[File:Stratified sampling.PNG|thumb|300px|A visual representation of selecting a random sample using the stratified sampling technique]] When the population embraces a number of distinct categories, the frame can be organized by these categories into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected.<ref name="Robert M. Groves, et al"/> The ratio of the size of this random selection (or sample) to the size of the population is called a [[sampling fraction]].<ref name=sampling-minimax/> There are several potential benefits to stratified sampling.<ref name=sampling-minimax/> First, dividing the population into distinct, independent strata can enable researchers to draw inferences about specific subgroups that may be lost in a more generalized random sample. Second, utilizing a stratified sampling method can lead to more efficient statistical estimates (provided that strata are selected based upon relevance to the criterion in question, instead of availability of the samples). Even if a stratified sampling approach does not lead to increased statistical efficiency, such a tactic will not result in less efficiency than would simple random sampling, provided that each stratum is proportional to the group's size in the population. Third, it is sometimes the case that data are more readily available for individual, pre-existing strata within a population than for the overall population; in such cases, using a stratified sampling approach may be more convenient than aggregating data across groups (though this may potentially be at odds with the previously noted importance of utilizing criterion-relevant strata). Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to different strata, potentially enabling researchers to use the approach best suited (or most cost-effective) for each identified subgroup within the population. There are, however, some potential drawbacks to using stratified sampling. First, identifying strata and implementing such an approach can increase the cost and complexity of sample selection, as well as leading to increased complexity of population estimates. Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata. Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods (although in most cases, the required sample size would be no larger than would be required for simple random sampling). ; A stratified sampling approach is most effective when three conditions are met: # Variability within strata are minimized # Variability between strata are maximized # The variables upon which the population is stratified are strongly correlated with the desired dependent variable. ; Advantages over other sampling methods # Focuses on important subpopulations and ignores irrelevant ones. # Allows use of different sampling techniques for different subpopulations. # Improves the accuracy/efficiency of estimation. # Permits greater balancing of statistical power of tests of differences between strata by sampling equal numbers from strata varying widely in size. ; Disadvantages # Requires selection of relevant stratification variables which can be difficult. # Is not useful when there are no homogeneous subgroups. # Can be expensive to implement. ; Poststratification Stratification is sometimes introduced after the sampling phase in a process called "poststratification".<ref name="Robert M. Groves, et al"/> This approach is typically implemented due to a lack of prior knowledge of an appropriate stratifying variable or when the experimenter lacks the necessary information to create a stratifying variable during the sampling phase. Although the method is susceptible to the pitfalls of post hoc approaches, it can provide several benefits in the right situation. Implementation usually follows a simple random sample. In addition to allowing for stratification on an ancillary variable, poststratification can be used to implement weighting, which can improve the precision of a sample's estimates.<ref name="Robert M. Groves, et al"/> ; Oversampling Choice-based sampling or oversampling is one of the stratified sampling strategies. In choice-based sampling,<ref>{{cite journal|last1=Scott|first1=A.J.|last2=Wild|first2=C.J.|year=1986|title=Fitting logistic models under case-control or choice-based sampling|journal=[[Journal of the Royal Statistical Society, Series B]]|volume=48|issue=2|pages=170β182|doi=10.1111/j.2517-6161.1986.tb01400.x |jstor=2345712}}</ref> the data are stratified on the target and a sample is taken from each stratum so that rarer target classes will be more represented in the sample. The model is then built on this [[Sampling bias|biased sample]]. The effects of the input variables on the target are often estimated with more precision with the choice-based sample even when a smaller overall sample size is taken, compared to a random sample. The results usually must be adjusted to correct for the oversampling.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)