Editing Regression analysis (section)

{{Short description|Set of statistical processes for estimating the relationships among variables}}
[[File:Normdist regression.png|thumb|right|200px|Regression line for 50 random points in a [[Gaussian distribution]] around the line y=1.5x+2.]]
{{Regression bar}}
{{Machine learning|Problems}}

In [[statistical model]]ing, '''regression analysis''' is a set of statistical processes for [[Estimation theory|estimating]] the relationships between a [[dependent variable]] (often called the ''outcome'' or ''response'' variable, or a ''label'' in machine learning parlance) and one or more error-free [[independent variable]]s (often called ''regressors'', ''predictors'', ''covariates'', ''explanatory variables'' or ''features'').

The most common form of regression analysis is [[linear regression]], in which one finds the line (or a more complex [[linear combination]]) that most closely fits the data according to a specific mathematical criterion. For example, the method of [[ordinary least squares]] computes the unique line (or [[hyperplane]]) that minimizes the sum of squared differences between the true data and that line (or hyperplane). For specific mathematical reasons (see [[linear regression]]), this allows the researcher to estimate the [[conditional expectation]] (or population [[average value]]) of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative [[location parameters]] (e.g., [[quantile regression]] or [[Necessary Condition Analysis]]<ref>[http://www.erim.eur.nl/centres/necessary-condition-analysis/ Necessary Condition Analysis]</ref>) or estimate the conditional expectation across a broader collection of non-linear models (e.g., [[nonparametric regression]]).

Regression analysis is primarily used for two conceptually distinct purposes. First, regression analysis is widely used for [[prediction]] and [[forecasting]], where its use has substantial overlap with the field of [[machine learning]]. Second, in some situations regression analysis can be used to infer [[causality|causal relationships]] between the independent and dependent variables. Importantly, regressions by themselves only reveal relationships between a dependent variable and a collection of independent variables in a fixed dataset. To use regressions for prediction or to infer causal relationships, respectively, a researcher must carefully justify why existing relationships have predictive power for a new context or why a relationship between two variables has a causal interpretation. The latter is especially important when researchers hope to estimate causal relationships using [[observational study|observational data]].<ref name="Freedman2009">{{cite book|author=David A. Freedman|title=Statistical Models: Theory and Practice|url=https://books.google.com/books?id=fW_9BV5Wpf8C&q=%22regression+analysis%22|date=27 April 2009|publisher=Cambridge University Press|isbn=978-1-139-47731-4}}</ref><ref>R. Dennis Cook; Sanford Weisberg [https://www.jstor.org/stable/270724 Criticism and Influence Analysis in Regression], ''Sociological Methodology'', Vol. 13. (1982), pp. 313–361</ref>