Editing Prediction (section)

==Statistics==
In [[statistics]], prediction is a part of [[statistical inference]]. One particular approach to such inference is known as [[predictive inference]], but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one possible description of statistics is that it provides a means of transferring knowledge about a sample of a population to the whole population, and to other related populations, which is not necessarily the same as prediction over time. When information is transferred across time, often to specific points in time, the process is known as [[forecasting]].<ref>{{cite book |last=Cox |first=D. R. |year=2006 |title=Principles of Statistical Inference |publisher=Cambridge University Press |isbn=978-0-521-68567-2 }}</ref>{{Failed verification|date=November 2017|reason=Reference does not mention "forecasting" at all.}} Forecasting usually requires [[time series]] methods, while prediction is often performed on [[cross-sectional data]].

Statistical techniques used for prediction include [[Regression analysis#Prediction|regression]] and its various sub-categories such as [[linear regression]], [[generalized linear model]]s ([[logistic regression]], [[Poisson regression]], [[Probit regression]]), etc. In case of forecasting, [[autoregressive moving average model]]s and [[vector autoregression]] models can be utilized. When these and/or related, generalized set of regression or [[machine learning]] methods are deployed in commercial usage, the field is known as [[predictive analytics]].<ref>{{cite book |last=Siegel |first=Eric |year=2013 |title=Predictive Analysis: The Power to Predict Who Will Click, Buy, Lie, or Die |publisher=John Wiley & Sons |location=Hoboken, NJ |isbn=978-1-118-35685-2 }}</ref>

In many applications, such as time series analysis, it is possible to estimate the models that generate the observations. If models can be expressed as [[transfer function]]s or in terms of state-space parameters then smoothed, filtered and predicted data estimates can be calculated.{{Citation needed|date=December 2019|reason=removed citation to predatory publisher content}}  If the underlying generating models are linear then a minimum-variance [[Kalman filter]] and a minimum-variance smoother may be used to recover data of interest from noisy measurements. These techniques rely on one-step-ahead predictors (which minimise the variance of the [[prediction error]]). When the generating models are nonlinear then stepwise linearizations may be applied within [[Extended Kalman Filter]] and smoother recursions. However, in nonlinear cases, optimum minimum-variance performance guarantees no longer apply.<ref>{{cite journal |last1=Julier |first1=S. J. |last2=Uhlmann |first2=J. K. |year=2004 |title=Unscented filtering and nonlinear estimation |journal=Proceedings of the IEEE |volume=92 |issue=3 |pages=401–422 |doi=10.1109/jproc.2003.823141 |s2cid=9614092 |citeseerx=10.1.1.136.6539 }}</ref>

To use regression analysis for prediction, data are collected on the variable that is to be predicted, called the [[dependent variable]] or response variable, and on one or more variables whose values are [[hypothesis|hypothesized]] to influence it, called [[independent variable]]s or explanatory variables. A [[Function (mathematics)#Real function|functional form]], often linear, is hypothesized for the postulated causal relationship, and the [[parameter]]s of the function are [[estimation|estimated]] from the data—that is, are chosen so as to optimize is some way the [[goodness of fit|fit]] of the function, thus parameterized, to the data. That is the estimation step. For the prediction step, explanatory variable values that are deemed relevant to future (or current but not yet observed) values of the dependent variable are input to the parameterized function to generate predictions for the dependent variable.<ref>{{cite book |last=Fox |first=John |year=2016 |title=Applied Regression Analysis and Generalized Linear Models |publisher=Sage |location=London |edition=Third |isbn=978-1-4522-0566-3 }}</ref>

An unbiased performance estimate of a model can be obtained on [[Hold-out cross-validation|hold-out test sets]]. The predictions can visually be compared to the ground truth in a [[parity plot]].