Editing Interaction (statistics) (section)

==In modeling==
===In ANOVA===
A simple setting in which interactions can arise is a [[factorial experiment|two-factor experiment]] analyzed using [[Analysis of Variance]] (ANOVA).  Suppose we have two binary factors ''A'' and ''B''.  For example, these factors might indicate whether either of two treatments were administered to a patient, with the treatments applied either singly, or in combination.  We can then consider the average treatment response (e.g. the symptom levels following treatment) for each patient, as a function of the treatment combination that was administered.  The following table shows one possible situation:

{| cellpadding="5" cellspacing="0" align="center"
|-
! 
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''B''&nbsp;=&nbsp;0
! style="background:#ffdead;border-top:1px solid black;border-right:1px solid black;" | ''B''&nbsp;=&nbsp;1
|-
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''A''&nbsp;=&nbsp;0
! style="border-left:1px solid black;" | 6
! style="border-right:1px solid black;" | 7
|-
! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''A''&nbsp;=&nbsp;1
! style="border-bottom:1px solid black;border-left:1px solid black;" | 4
! style="border-bottom:1px solid black;border-right:1px solid black;" | 5
|}

In this example, there is no interaction between the two treatments &mdash; their effects are additive.  The reason for this is that the difference in mean response between those subjects receiving treatment ''A'' and those not receiving treatment ''A'' is &minus;2 regardless of whether treatment ''B'' is administered (&minus;2&nbsp;=&nbsp;4&nbsp;&minus;&nbsp;6) or not (&minus;2&nbsp;=&nbsp;5&nbsp;&minus;&nbsp;7). Note that it automatically follows that the difference in mean response between those subjects receiving treatment ''B'' and those not receiving treatment ''B'' is the same regardless of whether treatment ''A'' is administered (7&nbsp;&minus;&nbsp;6&nbsp;=&nbsp;5&nbsp;&minus;&nbsp;4).

In contrast, if the following average responses are observed

{| cellpadding="5" cellspacing="0" align="center"
|-
! 
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''B''&nbsp;=&nbsp;0
! style="background:#ffdead;border-top:1px solid black;border-right:1px solid black;" | ''B''&nbsp;=&nbsp;1
|-
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''A''&nbsp;=&nbsp;0
! style="border-left:1px solid black;" | 1
! style="border-right:1px solid black;" | 4
|-
! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''A''&nbsp;=&nbsp;1
! style="border-bottom:1px solid black;border-left:1px solid black;" | 7
! style="border-bottom:1px solid black;border-right:1px solid black;" | 6
|}

then there is an interaction between the treatments &mdash; their effects are not additive.  Supposing that greater numbers correspond to a better response, in this situation treatment ''B'' is helpful on average if the subject is not also receiving treatment ''A'', but is detrimental on average if given in combination with treatment ''A''. Treatment ''A'' is helpful on average regardless of whether treatment ''B'' is also administered, but it is more helpful in both absolute and relative terms if given alone, rather than in combination with treatment ''B''. Similar observations are made for this particular example in the next section.

===Qualitative and quantitative interactions===
In many applications it is useful to distinguish between qualitative and quantitative interactions.<ref>{{cite book | last=Peto | first=D. P. | year=1982 | chapter=Statistical aspects of cancer trials |title=Treatment of Cancer |edition=First | publisher=Chapman and Hall |location=London |isbn=0-412-21850-X }}</ref>  A quantitative interaction between ''A'' and ''B'' is a situation where the magnitude of the effect of ''B'' depends on the value of ''A'', but the direction of the effect of ''B'' is constant for all ''A''.  A qualitative interaction between ''A'' and ''B'' refers to a situation where both the magnitude and direction of each variable's effect can depend on the value of the other variable.

The table of means on the left, below, shows a quantitative interaction &mdash; treatment ''A'' is beneficial both when ''B'' is given, and when ''B'' is not given, but the benefit is greater when ''B'' is not given (i.e. when ''A'' is given alone).  The table of means on the right shows a qualitative interaction.  ''A'' is harmful when ''B'' is given, but it is beneficial when ''B'' is not given.  Note that the same interpretation would hold if we consider the benefit of ''B'' based on whether ''A'' is given.

{| cellpadding="5" cellspacing="0" align="center"
|-
! 
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''B''&nbsp;=&nbsp;0
! style="background:#ffdead;border-top:1px solid black;border-right:1px solid black;" | ''B''&nbsp;=&nbsp;1
!
!
!
!
!
!
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''B''&nbsp;=&nbsp;0
! style="background:#ffdead;border-top:1px solid black;border-right:1px solid black;" | ''B''&nbsp;=&nbsp;1
|-
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''A''&nbsp;=&nbsp;0
! style="border-left:1px solid black;" | 2
! style="border-right:1px solid black;" | 1
!
!
!
!
!
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''A''&nbsp;=&nbsp;0
! style="border-left:1px solid black;" | 2
! style="border-right:1px solid black;" | 6
|-
! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''A''&nbsp;=&nbsp;1
! style="border-bottom:1px solid black;border-left:1px solid black;" | 5
! style="border-bottom:1px solid black;border-right:1px solid black;" | 3
!
!
!
!
!
! style="background:#ffdead;border-left:1px solid black;border-bottom:1px solid black;" | ''A''&nbsp;=&nbsp;1
! style="border-left:1px solid black;border-bottom:1px solid black;" | 5
! style="border-right:1px solid black;border-bottom:1px solid black;" | 3
|}

The distinction between qualitative and quantitative interactions depends on the order in which the variables are considered (in contrast, the property of additivity is invariant to the order of the variables). In the following table, if we focus on the effect of treatment ''A'', there is a quantitative interaction &mdash; giving treatment ''A'' will improve the outcome on average regardless of whether treatment ''B'' is or is not already being given (although the benefit is greater if treatment ''A'' is given alone). However, if we focus on the effect of treatment ''B'', there is a qualitative interaction &mdash; giving treatment ''B'' to a subject who is already receiving treatment ''A'' will (on average) make things worse, whereas giving treatment ''B'' to a subject who is not receiving treatment ''A'' will improve the outcome on average.

{| cellpadding="5" cellspacing="0" align="center"
|-
! 
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''B''&nbsp;=&nbsp;0
! style="background:#ffdead;border-top:1px solid black;border-right:1px solid black;" | ''B''&nbsp;=&nbsp;1
|-
! style="background:#ffdead;border-left:1px solid black;border-top:1px solid black;" | ''A''&nbsp;=&nbsp;0
! style="border-left:1px solid black;" | 1
! style="border-right:1px solid black;" | 4
|-
! style="background:#ffdead;border-bottom:1px solid black;border-left:1px solid black;" | ''A''&nbsp;=&nbsp;1
! style="border-bottom:1px solid black;border-left:1px solid black;" | 7
! style="border-bottom:1px solid black;border-right:1px solid black;" | 6
|}

===Unit treatment additivity===
In its simplest form, the assumption of treatment unit additivity states that the observed response ''y''<sub>''ij''</sub> from experimental unit ''i'' when receiving treatment ''j'' can be written as the sum ''y''<sub>''ij''</sub>&nbsp;=&nbsp;''y''<sub>''i''</sub>&nbsp;+&nbsp;''t''<sub>''j''</sub>.<ref name="Kempthorne (1979)">{{cite book |author-link=Oscar Kempthorne |last=Kempthorne |first=Oscar |year=1979 |title=The Design and Analysis of Experiments |edition=Corrected reprint of (1952) Wiley |publisher=Robert E. Krieger |isbn=978-0-88275-105-4 }}</ref><ref name=Cox1958_2>{{cite book |author-link=David R. Cox |last=Cox |first=David R. |year=1958 |title=Planning of experiments |publisher=Wiley |isbn=0-471-57429-5 |at=Chapter 2 }}</ref><ref>{{cite book
|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
|year=2008
|title=Design and Analysis of Experiments, Volume I: Introduction to Experimental Design
|edition=Second
|publisher=Wiley
|isbn=978-0-471-72756-9
|at=Chapters 5-6 }}</ref> The assumption of unit treatment additivity implies that every treatment has exactly the same additive effect on each experimental unit. Since any given experimental unit can only undergo one of the treatments, the assumption of unit treatment additivity is a hypothesis that is not directly falsifiable, according to Cox{{Citation needed|date=April 2010}} and Kempthorne.{{Citation needed|date=April 2010}}

However, many consequences of treatment-unit additivity can be falsified.{{Citation needed|date=April 2010}} For a randomized experiment, the assumption of treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit treatment additivity is that the variance is constant.{{Citation needed|date=April 2010}}

The property of unit treatment additivity is not invariant under a change of scale,{{Citation needed|date=April 2010}} so statisticians often use transformations to achieve unit treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance.<ref>{{cite book
|author=Hinkelmann, Klaus and [[Oscar Kempthorne|Kempthorne, Oscar]]
|year=2008
|title=Design and Analysis of Experiments, Volume I: Introduction to Experimental Design
|edition=Second
|publisher=Wiley
|isbn=978-0-471-72756-9
|at=Chapters 7-8 }}</ref> In many cases, a statistician may specify that logarithmic transforms be applied to the responses, which are believed to follow a multiplicative model.<ref name=Cox1958_2/><ref name="Bailey on eelworms">{{cite book |last=Bailey |first=R. A.|title=Design of Comparative Experiments|url=http://www.maths.qmul.ac.uk/~rab/DOEbook/|publisher=Cambridge University Press |year=2008 |isbn=978-0-521-68357-9}} Pre-publication chapters are available on-line.</ref>

The assumption of unit treatment additivity was enunciated in experimental design by Kempthorne{{Citation needed|date=April 2010}} and Cox{{Citation needed|date=April 2010}}. Kempthorne's use of unit treatment additivity and randomization is similar to the design-based analysis of finite population survey sampling.

In recent years, it has become common{{Citation needed|date=April 2010}} to use the terminology of Donald Rubin, which uses counterfactuals. Suppose we are comparing two groups of people with respect to some attribute ''y''.  For example, the first group might consist of people who are given a standard treatment for a medical condition, with the second group consisting of people who receive a new treatment with unknown effect.  Taking a "counterfactual" perspective, we can consider an individual whose attribute has value ''y'' if that individual belongs to the first group, and whose attribute has value ''τ''(''y'') if the individual belongs to the second group.  The assumption of "unit treatment additivity" is that ''τ''(''y'')&nbsp;=&nbsp;''τ'', that is, the "treatment effect" does not depend on ''y''.  Since we cannot observe both ''y'' and τ(''y'') for a given individual, this is not testable at the individual level.  However, unit treatment additivity implies that the [[cumulative distribution function]]s ''F''<sub>1</sub> and ''F''<sub>2</sub> for the two groups satisfy 
''F''<sub>2</sub>(''y'') &nbsp;=&nbsp;''F''<sub>1</sub>(''y&nbsp;&minus;&nbsp;τ''), as long as the assignment of individuals to groups 1 and 2 is independent of all other factors influencing ''y'' (i.e. there are no [[confounding variable|confounders]]).  Lack of unit treatment additivity can be viewed as a form of interaction between the treatment assignment (e.g. to groups 1 or 2), and the baseline, or untreated value of ''y''.

===Categorical variables===
Sometimes the interacting variables are categorical variables rather than real numbers and the study might then be dealt with as an [[analysis of variance]] problem.  For example, members of a population may be classified by religion and by occupation.  If one wishes to predict a person's height based only on the person's religion and occupation, a simple ''additive'' model, i.e., a model without interaction, would add to an overall average height an adjustment for a particular religion and another for a particular occupation.  A model with interaction, unlike an [[additive model]], could add a further adjustment for the "interaction" between that religion and that occupation.  This example may cause one to suspect that the word ''interaction'' is something of a misnomer.

Statistically, the presence of an interaction between categorical variables is generally tested using a form of [[analysis of variance]] (ANOVA). If one or more of the variables is continuous in nature, however, it would typically be tested using moderated multiple regression.<ref name=Overton2001>{{Cite journal
 | author = Overton, R. C.
 | year = 2001
 | title = Moderated multiple regression for interactions involving categorical variables: a statistical control for heterogeneous variance across two groups
 | journal = Psychol Methods
 | volume = 6
 | issue = 3
 | pages = 218–33
 | doi = 10.1037/1082-989X.6.3.218
 | pmid = 11570229
 }}</ref> This is so-called because a moderator is a variable that affects the strength of a relationship between two other variables.

===Designed experiments===
[[Genichi Taguchi]] contended<ref>{{Cite web|title = Design of Experiments - Taguchi Experiments|url = http://www.qualitytrainingportal.com/resources/doe/taguchi_concepts.htm|website = www.qualitytrainingportal.com|access-date = 2015-11-27}}</ref> that interactions could be eliminated from a [[system]] by appropriate choice of response variable and transformation. However [[George Box]] and others have argued that this is not the case in general.<ref>{{Cite journal
 | author = George E. P. Box
 | author-link = George E. P. Box
 | year = 1990
 | title = Do interactions matter?
 | journal = Quality Engineering
 | volume = 2
 | pages = 365–369
 | url = http://cqpi.engr.wisc.edu/system/files/r046.pdf
 | access-date = 2009-07-28
 | archive-url = https://web.archive.org/web/20100610131759/http://cqpi.engr.wisc.edu/system/files/r046.pdf
 | archive-date = 2010-06-10
 | url-status = dead
 | doi = 10.1080/08982119008962728
 }}</ref>

===Model size===
Given ''n'' predictors, the number of terms in a linear model that includes a constant, every predictor, and every possible interaction is <math>\tbinom{n}{0} + \tbinom{n}{1} + \tbinom{n}{2} + \cdots + \tbinom{n}{n} = 2^n</math>. Since this quantity grows exponentially, it readily becomes impractically large. One method to limit the size of the model is to limit the order of interactions. For example, if only two-way interactions are allowed, the number of terms becomes <math>\tbinom{n}{0} + \tbinom{n}{1} + \tbinom{n}{2} = 1 + \tfrac{1}{2}n + \tfrac{1}{2}n^2</math>. The below table shows the number of terms for each number of predictors and maximum order of interaction.

{| class="wikitable" style="text-align: right;"
|+ Number of terms
! rowspan="2" | Predictors
! colspan="5" | Including up to ''m''-way interactions
|-
! 2 !! 3 !! 4 !! 5 !! ∞
|-
! scope="row" | 1
| 2 || 2 || 2 || 2 || 2
|-
! scope="row" | 2
| 4 || 4 || 4 || 4 || 4
|-
! scope="row" | 3
| 7 || 8 || 8 || 8 || 8
|-
! scope="row" | 4
| 11 || 15 || 16 || 16 || 16
|-
! scope="row" | 5
| 16 || 26 || 31 || 32 || 32
|-
! scope="row" | 6
| 22 || 42 || 57 || 63 || 64
|-
! scope="row" | 7
| 29 || 64 || 99 || 120 || 128
|-
! scope="row" | 8
| 37 || 93 || 163 || 219 || 256
|-
! scope="row" | 9
| 46 || 130 || 256 || 382 || 512
|-
! scope="row" | 10
| 56 || 176 || 386 || 638 || 1,024
|-
! scope="row" | 11
| 67 || 232 || 562 || 1,024 || 2,048
|-
! scope="row" | 12
| 79 || 299 || 794 || 1,586 || 4,096
|-
! scope="row" | 13
| 92 || 378 || 1,093 || 2,380 || 8,192
|-
! scope="row" | 14
| 106 || 470 || 1,471 || 3,473 || 16,384
|-
! scope="row" | 15
| 121 || 576 || 1,941 || 4,944 || 32,768
|-
! scope="row" | 20
| 211 || 1,351 || 6,196 || 21,700 || 1,048,576
|-
! scope="row" | 25
| 326 || 2,626 || 15,276 || 68,406 || 33,554,432
|-
! scope="row" | 50
| 1,276 || 20,876 || 251,176 || 2,369,936 || 10<sup>15</sup>
|-
! scope="row" | 100
| 5,051 || 166,751 || 4,087,976 || 79,375,496 || 10<sup>30</sup>
|-
! scope="row" | 1,000
| 500,501 || 166,667,501 || 10<sup>10</sup> || 10<sup>12</sup> || 10<sup>300</sup>
|}

===In regression===

The most general approach to modeling interaction effects involves regression, starting from the elementary version given above:

:<math>Y = c + ax_1 + bx_2 + d(x_1\times x_2) + \text{error} \,</math>

where the interaction term <math>(x_1\times x_2)</math> could be formed explicitly by multiplying two (or more) variables, or implicitly using factorial notation in modern statistical packages such as [[Stata]]. The components ''x''<sub>1</sub> and ''x''<sub>2</sub> might be measurements or {0,1} [[dummy variable (statistics)|dummy variable]]s in any combination. Interactions involving a dummy variable multiplied by a measurement variable are termed ''slope dummy variables'',<ref>Hamilton, L.C. 1992. ''Regression with Graphics: A Second Course in Applied Statistics''. Pacific Grove, CA: Brooks/Cole.  {{ISBN|978-0534159009}}</ref> because they estimate and test the difference in slopes between groups 0 and 1.

When measurement variables are employed in interactions, it is often desirable to work with centered versions, where the variable's mean (or some other reasonably central value) is set as zero. Centering can make the main effects in interaction models more interpretable, as it reduces the [[multicollinearity]] between the interaction term and the main effects.<ref>{{Cite journal|last1=Iacobucci|first1=Dawn|last2=Schneider|first2=Matthew J.|last3=Popovich|first3=Deidre L.|last4=Bakamitsos|first4=Georgios A.|date=2016|title=Mean centering helps alleviate "micro" but not "macro" multicollinearity|journal=Behavior Research Methods|language=en|volume=48|issue=4|pages=1308–1317|doi=10.3758/s13428-015-0624-x|pmid=26148824 |issn=1554-3528|doi-access=free}}</ref> The coefficient ''a'' in the equation above, for example, represents the effect of ''x''<sub>1</sub> when ''x''<sub>2</sub> equals zero.

[[File:Tea party interaction.png|thumb|Interaction of education and political party affecting beliefs about climate change]]Regression approaches to interaction modeling are very general because they can accommodate additional predictors, and many alternative specifications or estimation strategies beyond [[ordinary least squares]]. [[Robust regression|Robust]], [[Quantile regression|quantile]], and mixed-effects ([[Multilevel model|multilevel]]) models are among the possibilities, as is [[generalized linear model]]ing encompassing a wide range of categorical, ordered, counted or otherwise limited dependent variables. The graph depicts an education*politics interaction, from a probability-weighted [[logit regression]] analysis of survey data.<ref>{{cite journal | last1 = Hamilton | first1 = L.C. | last2 = Saito | first2 = K. | year = 2015 | title = A four-party view of U.S. environmental concern | journal = Environmental Politics | volume = 24 | issue = 2| pages = 212–227 | doi = 10.1080/09644016.2014.976485 | bibcode = 2015EnvPo..24..212H | s2cid = 154762226 }}</ref>