Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Sampling (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Population definition== Successful statistical practice is based on focused problem definition. In sampling, this includes defining the "[[Statistical population|population]]" from which our sample is drawn. A population can be defined as including all people or items with the characteristics one wishes to understand. Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population. Sometimes what defines a population is obvious. For example, a manufacturer needs to decide whether a batch of material from [[batch production|production]] is of high enough quality to be released to the customer or should be scrapped or reworked due to poor quality. In this case, the batch is the population. Although the population of interest often consists of physical objects, sometimes it is necessary to sample over time, space, or some combination of these dimensions. For instance, an investigation of supermarket staffing could examine checkout line length at various times, or a study on endangered penguins might aim to understand their usage of various hunting grounds over time. For the time dimension, the focus may be on periods or discrete occasions. In other cases, the examined 'population' may be even less tangible. For example, [[Joseph Jagger]] studied the behaviour of [[roulette]] wheels at a casino in [[Monte Carlo]], and used this to identify a biased wheel. In this case, the 'population' Jagger wanted to investigate was the overall behaviour of the wheel (i.e. the [[probability distribution]] of its results over infinitely many trials), while his 'sample' was formed from observed results from that wheel. Similar considerations arise when taking repeated measurements of properties of materials such as the [[electrical conductivity]] of [[copper]]. This situation often arises when seeking knowledge about the [[cause system]] of which the ''observed'' population is an outcome. In such cases, sampling theory may treat the observed population as a sample from a larger 'superpopulation'. For example, a researcher might study the success rate of a new 'quit smoking' program on a test group of 100 patients, in order to predict the effects of the program if it were made available nationwide. Here the superpopulation is "everybody in the country, given access to this treatment" β a group that does not yet exist since the program is not yet available to all. The population from which the sample is drawn may not be the same as the population from which information is desired. Often there is a large but not complete overlap between these two groups due to frame issues etc. (see below). Sometimes they may be entirely separate β for instance, one might study rats in order to get a better understanding of human health, or one might study records from people born in 2008 in order to make predictions about people born in 2009. Time spent in making the sampled population and population of concern precise is often well spent because it raises many issues, ambiguities, and questions that would otherwise have been overlooked at this stage.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)