Editing Dependent and independent variables

{{For|dependent and independent random variables|Independence (probability theory)}}
{{short description|Concept in mathematical modeling, statistical modeling and experimental sciences}}

A variable is considered '''dependent''' if it depends on (or is hypothesized to depend on) an '''independent variable'''. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a [[mathematical function]]), on the values of other variables. Independent variables, on the other hand, are not seen as depending on any other variable in the scope of the experiment in question.{{efn|Even if the existing dependency is invertible (e.g., by finding the [[inverse function]] when it exists), the nomenclature is kept if the inverse dependency is not the object of study in the experiment.}} Rather, they are controlled by the experimenter.

[[File:Polynomialdeg2.svg|thumb|In single variable [[calculus]], a [[function (mathematics)|function]] is typically [[graph of a function|graphed]] with the [[horizontal axis]] representing the independent variable and the [[vertical axis]] representing the dependent variable.<ref>{{cite book | last = Hastings | first = Nancy Baxter | title = Workshop calculus: guided exploration with review | volume = 2 | publisher = Springer Science & Business Media | year = 1998| p = 31 }}</ref> In this function, ''y'' is the dependent variable and ''x'' is the independent variable.]]

==In pure mathematics==
In mathematics, a [[function (mathematics)|function]] is a rule for taking an input (in the simplest case, a number or set of numbers)<ref name=carlson>Carlson, Robert. A concrete introduction to real analysis. CRC Press, 2006. p.183</ref> and providing an output (which may also be a number).<ref name=carlson/> A symbol that stands for an arbitrary input is called an '''independent variable''', while a symbol that stands for an arbitrary output is called a '''dependent variable'''.<ref name=stewart>{{cite book | last = Stewart | first = James | title = Calculus | publisher = Cengage Learning | year = 2011 | section = 1.1 }}</ref> The most common symbol for the input is {{math|''x''}}, and the most common symbol for the output is {{math|''y''}}; the function itself is commonly written {{math|1=''y'' = ''f''(''x'')}}.<ref name=stewart/><ref>Anton, Howard, Irl C. Bivens, and Stephen Davis. Calculus Single Variable. John Wiley & Sons, 2012. Section 0.1</ref>

It is possible to have multiple independent variables or multiple dependent variables. For instance, in [[multivariable calculus]], one often encounters functions of the form {{math|1=''z'' = ''f''(''x'',''y'')}}, where {{math|''z''}} is a dependent variable and {{math|''x''}} and {{math|''y''}} are independent variables.<ref>Larson, Ron, and Bruce Edwards. Calculus. Cengage Learning, 2009. Section 13.1</ref> Functions with multiple outputs are often referred to as [[vector-valued functions]].

==In modeling and statistics==
In [[mathematical modeling]], the relationship between the set of dependent variables and set of independent variables is studied.{{cn|date=February 2024}}

In the simple [[stochastic]] [[linear model]] {{math|1=''y''<sub>''i''</sub> = a + b''x''<sub>''i''</sub> + ''e''<sub>''i''</sub>}} the term {{math|''y''<sub>''i''</sub>}} is the {{mvar|i}}th value of the dependent variable and {{math|''x''<sub>''i''</sub>}} is the {{mvar|i}}th value of the independent variable. The term {{math|''e''<sub>''i''</sub>}} is known as the "error" and contains the variability of the dependent variable not explained by the independent variable.{{cn|date=February 2024}}

With multiple independent variables, the model is {{math|1=''y''<sub>''i''</sub> = a + b''x''<sub>''i'',1</sub> + b''x''<sub>''i'',2</sub> + ... + b''x''<sub>''i,n''</sub> + ''e''<sub>i</sub>}}, where {{math|''n''}} is the number of independent variables.{{citation needed|date=November 2019}}

In statistics, more specifically in [[linear regression]], a [[scatter plot]] of data is generated with {{mvar|X}} as the independent variable and {{mvar|Y}} as the dependent variable. This is also called a bivariate dataset, {{math|(''x''<sub>1</sub>, ''y''<sub>1</sub>)(''x''<sub>2</sub>, ''y''<sub>2</sub>) ...(''x''<sub>''i''</sub>, ''y''<sub>''i''</sub>)}}. The simple linear regression model takes the form of {{math|1=''Y''<sub>''i''</sub> = a + B''x''<sub>''i''</sub> + ''U''<sub>''i''</sub>}}, for {{math|1=''i'' = 1, 2, ... , ''n''}}. In this case, {{math|''U''<sub>''i''</sub>, ... ,''U''<sub>''n''</sub>}} are independent random variables. This occurs when the measurements do not influence each other. Through propagation of independence, the independence of {{math|''U''<sub>''i''</sub>}} implies independence of {{math|''Y''<sub>''i''</sub>}}, even though each {{math|''Y''<sub>''i''</sub>}} has a different expectation value. Each {{math|''U''<sub>''i''</sub>}} has an expectation value of 0 and a variance of {{math|σ<sup>2</sup>}}.<ref name=Dekking>{{citation|title=A modern introduction to probability and statistics: understanding why and how|last=Dekking|first=Frederik Michel|date=2005|publisher=Springer|isbn=1-85233-896-2|oclc=783259968}}</ref>
Expectation of {{math|''Y''<sub>''i''</sub>}} Proof:<ref name=Dekking />

<math display="block">\operatorname{E}[Y_i] = \operatorname{E}[\alpha + \beta x_i + U_i] = \alpha + \beta x_i + \operatorname{E}[U_i] = \alpha + \beta x_i.</math>

The line of best fit for the [[bivariate data]]set takes the form {{math|1=''y'' = ''α'' + ''βx''}} and is called the regression line. {{mvar|α}} and {{mvar|β}} correspond to the intercept and slope, respectively.<ref name=Dekking />

In an [[design of experiments|experiment]], the variable manipulated by an experimenter is something that is proven to work, called an independent variable.<ref>{{Cite web|url=http://onlinestatbook.com/2/introduction/variables.html|title = Variables}}</ref> The dependent variable is the event expected to change when the independent variable is manipulated.<ref name="random house">'' Random House Webster's Unabridged Dictionary.'' Random House, Inc. 2001. Page 534, 971. {{ISBN|0-375-42566-7}}.</ref>

In [[data mining]]  tools (for [[multivariate statistics]] and [[machine learning]]), the dependent variable is assigned a ''role'' as '''{{visible anchor|target variable}}''' (or in some tools as ''label attribute''), while an independent variable may be assigned a role as ''regular variable'' <ref>[http://1xltkxylmzx3z8gd647akcdvov.wpengine.netdna-cdn.com/wp-content/uploads/2013/10/rapidminer-5.0-manual-english_v1.0.pdf English Manual version 1.0] {{webarchive|url=https://web.archive.org/web/20140210002634/http://1xltkxylmzx3z8gd647akcdvov.wpengine.netdna-cdn.com/wp-content/uploads/2013/10/rapidminer-5.0-manual-english_v1.0.pdf |date=2014-02-10 }} for [[RapidMiner]] 5.0, October 2013.</ref> or feature variable. Known values for the target variable are provided for the training data set and [[test data]] set, but should be predicted for other data. The target variable is used in [[supervised learning]] algorithms but not in unsupervised learning.

==Synonyms==
Depending on the context, an independent variable is sometimes called a "predictor variable", "regressor", "covariate", "manipulated variable", "explanatory variable", "exposure variable" (see [[reliability theory]]), "[[risk factor]]" (see [[medical statistics]]), "[[Feature (machine learning)|feature]]" (in [[machine learning]] and [[pattern recognition]]) or "input variable".<ref name="Dodgeindepvar">Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. {{ISBN|0-19-920613-9}} (entry for "independent variable")</ref><ref name="Dodgeregression">Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. {{ISBN|0-19-920613-9}} (entry for "regression")</ref>
In [[econometrics]], the term "control variable" is usually used instead of "covariate".<ref>{{cite book |last1=Gujarati |first1=Damodar N. |last2=Porter |first2=Dawn C.|author2-link=Dawn C. Porter |title=Basic Econometrics |location=New York |publisher=McGraw-Hill |year=2009 |edition=Fifth international |isbn=978-007-127625-2 |chapter=Terminology and Notation |pages=21 }}</ref><ref>{{cite book |last=Wooldridge |first=Jeffrey |year=2012 |title=Introductory Econometrics: A Modern Approach |location=Mason, OH |publisher=South-Western Cengage Learning |edition=Fifth |isbn=978-1-111-53104-1 |pages=22–23 }}</ref><ref>{{cite book |title=A Dictionary of Epidemiology |edition=Fourth |editor-first=John M. |editor-last=Last |publisher=Oxford UP |year=2001 |isbn=0-19-514168-7 }}</ref><ref>{{cite book |title=The Cambridge Dictionary of Statistics |edition=2nd |first=B. S. |last=Everitt |publisher=Cambridge UP |year=2002 |isbn=0-521-81099-X }}</ref><ref>{{cite journal |last=Woodworth |first=P. L. |year=1987 |title=Trends in U.K. mean sea level |journal=Marine Geodesy |volume=11 |issue=1 |pages=57–87 |doi=10.1080/15210608709379549 |bibcode=1987MarGe..11...57W }}</ref>

"{{vanchor|Explanatory variable}}" is preferred by some authors over "independent variable" when the quantities treated as independent variables may not be statistically independent or independently manipulable by the researcher.<ref name="Everitt1">Everitt, B.S. (2002) Cambridge Dictionary of Statistics, CUP. {{ISBN|0-521-81099-X}}</ref><ref name="Dodge1">Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. {{ISBN|0-19-920613-9}}</ref> If the independent variable is referred to as an "explanatory variable" then the term "{{vanchor|response variable}}" is preferred by some authors for the dependent variable.<ref name="Dodgeregression"/><ref name="Everitt1"/><ref name="Dodge1"/>

Depending on the context, a dependent variable is sometimes called a "response variable", "regressand", "criterion", "predicted variable", "measured variable", "explained variable", "experimental variable", "responding variable", "outcome variable", "output variable", "target" or "label".<ref name="Dodgeregression"/> In economics endogenous variables are usually referencing the target.

"{{vanchor|Explained variable}}" is preferred by some authors over "dependent variable" when the quantities treated as "dependent variables" may not be statistically dependent.<ref name="DAUME">Ash Narayan Sah (2009) Data Analysis Using Microsoft Excel, New Delhi. {{ISBN|978-81-7446-716-4}}</ref> If the dependent variable is referred to as an "explained variable" then the term "{{vanchor|predictor variable}}" is preferred by some authors for the independent variable.<ref name="DAUME"/>

An example is provided by the analysis of trend in sea level by {{Harvtxt|Woodworth|1987}}. Here the dependent variable (and variable of most interest) was the annual mean sea level at a given location for which a series of yearly values were available. The primary independent variable was time. Use was made of a covariate consisting of yearly values of annual mean atmospheric pressure at sea level. The results showed that inclusion of the covariate allowed improved estimates of the trend against time to be obtained, compared to analyses which omitted the covariate.

{| class="wikitable" style="margin-left:1.5em;"
|+ Antonym pairs
|-
| independent || dependent
|-
| input || output
|-
| regressor || regressand
|-
| predictor || predicted
|-
| explanatory || explained
|-
| exogenous || endogenous
|-
| manipulated || measured
|-
| exposure || outcome
|-
|feature
|label or target
|}

==Other variables==
A variable may be thought to alter the dependent or independent variables, but may not actually be the focus of the experiment. So that the variable will be kept constant or monitored to try to minimize its effect on the experiment. Such variables may be designated as either a "controlled variable", "[[control variable]]", or "fixed variable".

Extraneous variables, if included in a [[regression analysis]] as independent variables, may aid a researcher with accurate response parameter estimation, [[prediction]], and [[goodness of fit]], but are not of substantive interest to the [[hypothesis]] under examination. For example, in a study examining the effect of post-secondary education on lifetime earnings, some extraneous variables might be gender, ethnicity, social class, genetics, intelligence, age, and so forth. A variable is extraneous only when it can be assumed (or shown) to influence the [[dependent variable]]. If included in a regression, it can improve the [[Model fitting|fit of the model]]. If it is excluded from the regression and if it has a non-zero [[covariance]] with one or more of the independent variables of interest, its omission will [[bias (statistics)|bias]] the regression's result for the effect of that independent variable of interest. This effect is called [[confounding]] or [[omitted variable bias]]; in these situations, design changes and/or controlling for a variable statistical control is necessary.

Extraneous variables are often classified into three types:
# Subject variables, which are the characteristics of the individuals being studied that might affect their actions. These variables include age, gender, health status, mood, background, etc.
# Blocking variables or experimental variables are characteristics of the persons conducting the experiment which might influence how a person behaves. Gender, the presence of racial discrimination, language, or other factors may qualify as such variables.
# Situational variables are features of the environment in which the study or research was conducted, which have a bearing on the outcome of the experiment in a negative way. Included are the air temperature, level of activity, lighting, and time of day.

In modelling, variability that is not covered by the independent variable is designated by <math>e_I</math> and is known as the "[[errors and residuals|residual]]", "side effect", "[[errors and residuals|error]]", "unexplained share", "residual variable", "disturbance", or "tolerance".

==Examples==
* Effect of fertilizer on plant growths: {{pb}} In a study measuring the influence of different quantities of fertilizer on plant growth, the independent variable would be the amount of fertilizer used. The dependent variable would be the growth in height or mass of the plant. The controlled variables would be the type of plant, the type of fertilizer, the amount of sunlight the plant gets, the size of the pots, etc.
* Effect of drug dosage on symptom severity: {{pb}} In a study of how different doses of a drug affect the severity of symptoms, a researcher could compare the frequency and intensity of symptoms when different doses are administered. Here the independent variable is the dose and the dependent variable is the frequency/intensity of symptoms.
* Effect of temperature on pigmentation: {{pb}} In measuring the amount of color removed from beetroot samples at different temperatures, temperature is the independent variable and amount of pigment removed is the dependent variable.
* Effect of sugar added in a coffee: {{pb}} The taste varies with the amount of sugar added in the coffee. Here, the sugar is the independent variable, while the taste is the dependent variable.

==See also==
* [[Abscissa and ordinate]]
* [[Blocking (statistics)]]
* [[Latent and observable variables]]
* [[Mediator variable]]

== Notes ==
{{notelist}}

==References==
{{Reflist}}

{{wikiversity|Independent variable}}
{{wikiversity|Dependent variable}}

{{Differential equations topics}}

[[Category:Design of experiments]]
[[Category:Regression analysis]]
[[Category:Mathematical terminology]]
[[Category:Independence (probability theory)]]