Editing Imputation (statistics) (section)

{{Short description|Process of replacing missing data with substituted values}}
{{Other uses of|imputation|Imputation (disambiguation)}}

In [[statistics]], '''imputation''' is the process of replacing [[missing data]] with substituted values. When substituting for a data point, it is known as "'''unit imputation'''"; when substituting for a component of a data point, it is known as "'''item imputation'''". There are three main problems that missing data causes: missing data can introduce a substantial amount of [[bias (statistics)|bias]], make the handling and analysis of the data more arduous, and create reductions in [[Efficiency (statistics)|efficiency]].<ref>{{Cite journal|last1=Barnard|first1=J.|last2=Meng|first2=X. L.|date=1999-03-01|title=Applications of multiple imputation in medical studies: from AIDS to NHANES|journal=Statistical Methods in Medical Research|volume=8|issue=1|pages=17–36|issn=0962-2802|pmid=10347858|doi=10.1177/096228029900800103|s2cid=11453137}}</ref> Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with [[listwise deletion]] of cases that have missing values. That is to say, when one or more values are missing for a case, most [[List of statistical packages|statistical packages]] default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data.<ref>Gelman, Andrew, and [[Jennifer Hill]]. Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, 2006. Ch.25</ref> There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias.  A few of the well known attempts to deal with missing data include: [[#Hot deck|hot deck]] and [[#Cold deck|cold deck]] imputation; [[#Listwise (complete case) deletion|listwise and pairwise deletion]]; [[#Mean substitution|mean imputation]]; [[#Non-negative matrix factorization|non-negative matrix factorization]]; [[#Regression|regression imputation]]; [[#Hot-deck|last observation carried forward]]; [[#Regression|stochastic imputation]]; and [[#Multiple imputation|multiple imputation]].