Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Omitted-variable bias
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== In linear regression == === Intuition === Suppose the true cause-and-effect relationship is given by: :<math>y=a+bx+cz+u</math> with parameters ''a, b, c'', dependent variable ''y'', independent variables ''x'' and ''z'', and error term ''u''. We wish to know the effect of ''x'' itself upon ''y'' (that is, we wish to obtain an estimate of ''b''). Two conditions must hold true for omitted-variable bias to exist in [[linear regression]]: * the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient must not be zero); and * the omitted variable must be correlated with an independent variable specified in the regression (i.e., cov(''z'',''x'') must not equal zero). Suppose we omit ''z'' from the regression, and suppose the relation between ''x'' and ''z'' is given by :<math>z=d+fx+e</math> with parameters ''d'', ''f'' and error term ''e''. Substituting the second equation into the first gives :<math>y=(a+cd)+(b+cf)x+(u+ce).</math> If a regression of ''y'' is conducted upon ''x'' only, this last equation is what is estimated, and the regression coefficient on ''x'' is actually an estimate of (''b'' + ''cf'' ), giving not simply an estimate of the desired direct effect of ''x'' upon ''y'' (which is ''b''), but rather of its sum with the indirect effect (the effect ''f'' of ''x'' on ''z'' times the effect ''c'' of ''z'' on ''y''). Thus by omitting the variable ''z'' from the regression, we have estimated the [[total derivative]] of ''y'' with respect to ''x'' rather than its [[partial derivative]] with respect to ''x''. These differ if both ''c'' and ''f'' are non-zero. The direction and extent of the bias are both contained in ''cf'', since the effect sought is ''b'' but the regression estimates ''b+cf''. The extent of the bias is the absolute value of ''cf'', and the direction of bias is upward (toward a more positive or less negative value) if ''cf'' > 0 (if the direction of correlation between ''y'' and ''z'' is the same as that between ''x'' and ''z''), and it is downward otherwise. ===Detailed analysis=== As an example, consider a [[linear model]] of the form : <math>y_i = x_i \beta + z_i \delta + u_i,\qquad i = 1,\dots,n</math> where * ''x''<sub>''i''</sub> is a 1 Γ ''p'' row vector of values of ''p'' [[independent variable]]s observed at time ''i'' or for the ''i''<sup> th</sup> study participant; * ''Ξ²'' is a ''p'' Γ 1 column vector of unobservable parameters (the response coefficients of the dependent variable to each of the ''p'' independent variables in ''x''<sub>''i''</sub>) to be estimated; * ''z''<sub>''i''</sub> is a scalar and is the value of another independent variable that is observed at time ''i'' or for the ''i''<sup> th</sup> study participant; * ''Ξ΄'' is a scalar and is an unobservable parameter (the response coefficient of the dependent variable to ''z''<sub>''i''</sub>) to be estimated; * ''u''<sub>''i''</sub> is the unobservable [[errors and residuals in statistics|error term]] occurring at time ''i'' or for the ''i''<sup> th</sup> study participant; it is an unobserved realization of a [[random variable]] having [[expected value]] 0 (conditionally on ''x''<sub>''i''</sub> and ''z''<sub>''i''</sub>); * ''y''<sub>''i''</sub> is the observation of the [[dependent variable]] at time ''i'' or for the ''i''<sup> th</sup> study participant. We collect the observations of all variables subscripted ''i'' = 1, ..., ''n'', and stack them one below another, to obtain the [[matrix (mathematics)|matrix]] ''X'' and the [[vector (mathematics)|vectors]] ''Y'', ''Z'', and ''U'': :<math> X = \left[ \begin{array}{c} x_1 \\ \vdots \\ x_n \end{array} \right] \in \mathbb{R}^{n\times p},</math> and : <math> Y = \left[ \begin{array}{c} y_1 \\ \vdots \\ y_n \end{array} \right],\quad Z = \left[ \begin{array}{c} z_1 \\ \vdots \\ z_n \end{array} \right],\quad U = \left[ \begin{array}{c} u_1 \\ \vdots \\ u_n \end{array} \right] \in \mathbb{R}^{n\times 1}.</math> If the independent variable ''z'' is omitted from the regression, then the estimated values of the response parameters of the other independent variables will be given by the usual [[least squares]] calculation, :<math>\widehat{\beta} = (X'X)^{-1}X'Y\,</math> (where the "prime" notation means the [[transpose]] of a matrix and the -1 superscript is [[matrix inversion]]). Substituting for ''Y'' based on the assumed linear model, :<math> \begin{align} \widehat{\beta} & = (X'X)^{-1}X'(X\beta+Z\delta+U) \\ & =(X'X)^{-1}X'X\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U \\ & =\beta + (X'X)^{-1}X'Z\delta + (X'X)^{-1}X'U. \end{align} </math> On taking expectations, the contribution of the final term is zero; this follows from the assumption that ''U'' is uncorrelated with the regressors ''X''. On simplifying the remaining terms: :<math> \begin{align} E[ \widehat{\beta} \mid X ] & = \beta + (X'X)^{-1}E[ X'Z \mid X ]\delta \\ & = \beta + \text{bias}. \end{align} </math> The second term after the equal sign is the omitted-variable bias in this case, which is non-zero if the omitted variable ''z'' is correlated with any of the included variables in the matrix ''X'' (that is, if ''X′Z'' does not equal a vector of zeroes). Note that the bias is equal to the weighted portion of ''z''<sub>''i''</sub> which is "explained" by ''x''<sub>''i''</sub>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)