Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Dummy variable (statistics)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Numeric stand-ins in regression analysis}} {{about|the usage in statistics|the usage in computing and math|Bound variable}} [[File:Graph_showing_Wage_=_Ξ±0_+_Ξ΄0female_+_Ξ±1education_+_U,_Ξ΄0_0.jpg | thumb | right | A graph showing the gender wage gap]] In [[regression analysis]], a '''dummy variable''' (also known as '''indicator variable''' or just '''dummy''') is one that takes a [[Binary data|binary value]] (0 or 1) to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.<ref>Draper, N.R.; Smith, H. (1998) ''Applied Regression Analysis'', Wiley. ISBN 0-471-17082-8 (Chapter 14)</ref> For example, if we were studying the relationship between [[Sex|biological sex]] and [[income]], we could use a dummy variable to represent the sex of each individual in the study. The variable could take on a value of 1 for [[Male|males]] and 0 for [[Female|females]] (or vice versa). In [[machine learning]] this is known as [[One-hot#Machine learning and statistics|one-hot encoding]]. Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels, such as education level or occupation. In this case, multiple dummy variables would be created to represent each level of the variable, and only one dummy variable would take on a value of 1 for each observation. Dummy variables are useful because they allow us to include categorical variables in our analysis, which would otherwise be difficult to include due to their non-numeric nature. They can also help us to control for confounding factors and improve the validity of our results. As with any addition of variables to a model, the addition of dummy variables will increase the within-sample model fit ([[coefficient of determination]]), but at a cost of fewer [[Degrees of freedom (statistics)|degrees of freedom]] and loss of generality of the model (out of sample model fit). Too many dummy variables result in a model that does not provide any general conclusions. Dummy variables are useful in various cases. For example, in [[econometrics|econometric]] [[time series analysis]], dummy variables may be used to indicate the occurrence of wars, or major [[Strike action|strikes]]. It could thus be thought of as a [[Boolean data type|Boolean]], i.e., a [[truth value]] represented as the numerical value 0 or 1 (as is sometimes done in [[computer programming]]). Dummy variables may be extended to more complex cases. For example, seasonal effects may be captured by creating dummy variables for each of the seasons: D1=1 if the observation is for summer, and equals zero otherwise; D2=1 if and only if autumn, otherwise equals zero; D3=1 if and only if winter, otherwise equals zero; and D4=1 if and only if spring, otherwise equals zero. In the [[panel data]] [[fixed effects estimator]] dummies are created for each of the units in [[cross-sectional data]] (e.g. firms or countries) or periods in a [[pooled time-series]]. However in such regressions either the [[constant term]] has to be removed, or one of the dummies removed making this the base category against which the others are assessed, for the following reason: If dummy variables for all categories were included, their sum would equal 1 for all observations, which is identical to and hence perfectly correlated with the vector-of-ones variable whose coefficient is the constant term; if the vector-of-ones variable were also present, this would result in perfect [[multicollinearity]],<ref>{{cite journal|first=Daniel B.|last=Suits|year=1957|title=Use of Dummy Variables in Regression Equations|jstor=2281705|journal=Journal of the American Statistical Association|volume=52|issue=280|pages=548β551}}</ref> so that the matrix inversion in the estimation algorithm would be impossible. This is referred to as the '''dummy variable trap'''. ==See also== * {{Annotated link|Binary regression}} * {{Annotated link|Chow test}} * {{Annotated link|Statistical hypothesis testing|Hypothesis testing}} * {{Annotated link|Indicator function}} * {{Annotated link|Linear discriminant analysis|Linear discriminant function}} * {{Annotated link|Multicollinearity}} * {{Annotated link|One-hot}} ==References== {{notelist}} {{Reflist}} ==Further reading== *{{cite book |first1=Dimitrios |last1=Asteriou |first2=S. G. |last2=Hall |author-link2=Stephen G. Hall |title=Applied Econometrics |location=London |publisher=Palgrave Macmillan |edition=3rd |year=2015 |isbn=978-1-137-41546-2 |chapter=Dummy Variables |pages=209β230 }} *{{cite book |last=Kooyman |first=Marius A. |year=1976 |title=Dummy Variables in Econometrics |location=Tilburg |publisher=Tilburg University Press |isbn=90-237-2919-6 }} ==External links== {{Wikiversity|Dummy variable (statistics)}} *{{cite web |first=Marloes |last=Maathuis|author-link=Marloes Maathuis |title=Chapter 7: Dummy variable regression |work=Stat 423: Applied Regression and Analysis of Variance |date=2007 |url=http://stat.ethz.ch/~maathuis/teaching/stat423/handouts/Chapter7.pdf |archive-date=December 16, 2011 |archive-url=https://web.archive.org/web/20111216051820/https://stat.ethz.ch/~maathuis/teaching/stat423/handouts/Chapter7.pdf }} *{{cite web |first=John |last=Fox |date=2010 |title=Dummy-Variable Regression |url=https://socialsciences.mcmaster.ca/jfox/Courses/SPIDA/dummy-regression-notes.pdf }} *{{cite web |first=Samuel L. |last=Baker |title=Dummy Variables |date=2006 |url=http://hspm.sph.sc.edu/courses/J716/pdf/716-6%20Dummy%20Variables%20and%20Time%20Series.pdf |archive-date=March 1, 2006 |archive-url=https://web.archive.org/web/20060301032127/http://hspm.sph.sc.edu/courses/J716/pdf/716-6%20Dummy%20Variables%20and%20Time%20Series.pdf }} {{DEFAULTSORT:Dummy Variable (Statistics)}} [[Category:Regression variable selection]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:About
(
edit
)
Template:Annotated link
(
edit
)
Template:Cite book
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Notelist
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Wikiversity
(
edit
)