Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Multicollinearity
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Linear dependency situation in a regression model}} {{redirect-distinguish|Collinearity (statistics)|Collinearity (geometry)}} {{morefootnotes|date=January 2024}} {{Use dmy dates|date=March 2020}} {{unbalanced|date=October 2024}} In [[statistics]], '''multicollinearity''' or '''collinearity''' is a situation where the [[Independent variable|predictors]] in a [[Regression analysis|regression model]] are [[Linear independence|linearly dependent]]. '''Perfect multicollinearity''' refers to a situation where the [[Independent variable|predictive variables]] have an ''exact'' linear relationship. When there is perfect collinearity, the [[design matrix]] <math>X</math> has less than full [[Rank (linear algebra)|rank]], and therefore the [[moment matrix]] <math>X^{\mathsf{T}}X</math> cannot be [[Matrix inversion|inverted]]. In this situation, the [[Regression coefficient|parameter estimates]] of the regression are not well-defined, as the system of equations has [[Underdetermined system|infinitely many solutions]]. '''Imperfect multicollinearity''' refers to a situation where the [[Independent variable|predictive variables]] have a ''nearly'' exact linear relationship. Contrary to popular belief, neither the [[Gauss–Markov theorem]] nor the more common [[Maximum likelihood estimation|maximum likelihood]] justification for [[ordinary least squares]] relies on any kind of correlation structure between dependent predictors<ref name=":3">{{cite book |last=Gujarati |first=Damodar |url=https://archive.org/details/basiceconometric05edguja |title=Basic Econometrics |publisher=McGraw−Hill |year=2009 |isbn=9780073375779 |edition=4th |pages=[https://archive.org/details/basiceconometric05edguja/page/363 363] |chapter=Multicollinearity: what happens if the regressors are correlated? |author-link=Damodar N. Gujarati |url-access=registration}}</ref><ref name=":6">{{Cite journal |last1=Kalnins |first1=Arturs |last2=Praitis Hill |first2=Kendall |date=2023-12-13 |title=The VIF Score. What is it Good For? Absolutely Nothing |url=http://journals.sagepub.com/doi/10.1177/10944281231216381 |journal=Organizational Research Methods |volume=28 |pages=58–75 |language=en |doi=10.1177/10944281231216381 |issn=1094-4281|url-access=subscription }}</ref><ref name=":5">{{Cite journal |last=Leamer |first=Edward E. |date=1973 |title=Multicollinearity: A Bayesian Interpretation |url=https://www.jstor.org/stable/1927962 |journal=The Review of Economics and Statistics |volume=55 |issue=3 |pages=371–380 |doi=10.2307/1927962 |jstor=1927962 |issn=0034-6535|url-access=subscription }}</ref> (although perfect collinearity can cause problems with some software). There is no justification for the practice of removing collinear variables as part of regression analysis,<ref name=":3" /><ref name=":0">{{Cite web |last=Giles |first=Dave |date=2011-09-15 |title=Econometrics Beat: Dave Giles' Blog: Micronumerosity |url=https://davegiles.blogspot.com/2011/09/micronumerosity.html |access-date=2023-09-03 |website=Econometrics Beat}}</ref><ref>{{Cite book |last=Goldberger,(1964) |first=A.S. |title=Econometric Theory |publisher=Wiley |year=1964 |location=New York}}</ref><ref name=":1">{{Cite book |last=Goldberger |first=A.S. |title=A Course in Econometrics |publisher=Harvard University Press |location=Cambridge MA |chapter=Chapter 23.3}}</ref><ref name=":2">{{Cite journal |last=Blanchard |first=Olivier Jean |date=October 1987 |title=Comment |url=http://www.tandfonline.com/doi/abs/10.1080/07350015.1987.10509611 |journal=Journal of Business & Economic Statistics |language=en |volume=5 |issue=4 |pages=449–451 |doi=10.1080/07350015.1987.10509611 |issn=0735-0015|url-access=subscription }}</ref> and doing so may constitute [[scientific misconduct]]. Including collinear variables does not reduce the predictive power or [[Reliability (statistics)|reliability]] of the model as a whole,<ref name=":1" /> and does not reduce the accuracy of coefficient estimates.<ref name=":3" /> High collinearity indicates that it is exceptionally important to include all collinear variables, as excluding any will cause worse coefficient estimates, strong [[confounding]], and downward-biased estimates of [[standard error]]s.<ref name=":6" /> To address the high collinearity of a dataset, [[variance inflation factor]] can be used to identify the collinearity of the predictor variables.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)