Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Imputation (statistics)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Process of replacing missing data with substituted values}} {{Other uses of|imputation|Imputation (disambiguation)}} In [[statistics]], '''imputation''' is the process of replacing [[missing data]] with substituted values. When substituting for a data point, it is known as "'''unit imputation'''"; when substituting for a component of a data point, it is known as "'''item imputation'''". There are three main problems that missing data causes: missing data can introduce a substantial amount of [[bias (statistics)|bias]], make the handling and analysis of the data more arduous, and create reductions in [[Efficiency (statistics)|efficiency]].<ref>{{Cite journal|last1=Barnard|first1=J.|last2=Meng|first2=X. L.|date=1999-03-01|title=Applications of multiple imputation in medical studies: from AIDS to NHANES|journal=Statistical Methods in Medical Research|volume=8|issue=1|pages=17β36|issn=0962-2802|pmid=10347858|doi=10.1177/096228029900800103|s2cid=11453137}}</ref> Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with [[listwise deletion]] of cases that have missing values. That is to say, when one or more values are missing for a case, most [[List of statistical packages|statistical packages]] default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data.<ref>Gelman, Andrew, and [[Jennifer Hill]]. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006. Ch.25</ref> There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias. A few of the well known attempts to deal with missing data include: [[#Hot deck|hot deck]] and [[#Cold deck|cold deck]] imputation; [[#Listwise (complete case) deletion|listwise and pairwise deletion]]; [[#Mean substitution|mean imputation]]; [[#Non-negative matrix factorization|non-negative matrix factorization]]; [[#Regression|regression imputation]]; [[#Hot-deck|last observation carried forward]]; [[#Regression|stochastic imputation]]; and [[#Multiple imputation|multiple imputation]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)