Editing Regression analysis (section)

==Underlying assumptions==
{{more citations needed section|date=December 2020}}
By itself, a regression is simply a calculation using the data. In order to interpret the output of regression as a meaningful statistical quantity that measures real-world relationships, researchers often rely on a number of classical [[statistical assumption|assumptions]]. These assumptions often include:

*The sample is representative of the population at large.
*The independent variables are measured without error.
*Deviations from the model have an expected value of zero, conditional on covariates: <math>E(e_i | X_i) = 0</math>
*The variance of the residuals <math>e_i</math> is constant across observations ([[homoscedasticity]]).
* The residuals <math>e_i</math> are [[uncorrelated]] with one another. Mathematically, the [[Covariance matrix|variance–covariance matrix]] of the errors is [[Diagonal matrix|diagonal]].

A handful of conditions are sufficient for the least-squares estimator to possess desirable properties: in particular, the [[Gauss–Markov theorem|Gauss–Markov]] assumptions imply that the parameter estimates will be [[bias of an estimator|unbiased]], [[consistent estimator|consistent]], and [[efficient (statistics)|efficient]] in the class of linear unbiased estimators. Practitioners have developed a variety of methods to maintain some or all of these desirable properties in real-world settings, because these classical assumptions are unlikely to hold exactly. For example, modeling [[errors-in-variables model|errors-in-variables]] can lead to reasonable estimates independent variables are measured with errors. [[Heteroscedasticity-consistent standard errors]] allow the variance of <math>e_i</math> to change across values of <math>X_i</math>. Correlated errors that exist within subsets of the data or follow specific patterns can be handled using ''clustered standard errors, geographic weighted regression'', or [[Newey–West estimator|Newey–West]] standard errors, among other techniques. When rows of data correspond to locations in space, the choice of how to model <math>e_i</math> within geographic units can have important consequences.<ref>{{cite book|title=Geographically weighted regression: the analysis of spatially varying relationships|last1=Fotheringham|first1=A. Stewart|last2=Brunsdon|first2=Chris|last3=Charlton|first3=Martin|publisher=John Wiley|year=2002|isbn=978-0-471-49616-8|edition=Reprint|location=Chichester, England}}</ref><ref>{{cite journal|last=Fotheringham|first=AS|author2=Wong, DWS|date=1 January 1991|title=The modifiable areal unit problem in multivariate statistical analysis|journal=Environment and Planning A|volume=23|issue=7|pages=1025–1044|doi=10.1068/a231025|bibcode=1991EnPlA..23.1025F |s2cid=153979055}}</ref> The subfield of [[econometrics]] is largely focused on developing techniques that allow researchers to make reasonable real-world conclusions in real-world settings, where classical assumptions do not hold exactly.