Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Analysis of variance
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Generalizations== ANOVA is considered to be a special case of [[linear regression]]<ref>Gelman (2005, p.1) (with qualification in the later text)</ref><ref>Montgomery (2001, Section 3.9: The Regression Approach to the Analysis of Variance)</ref> which in turn is a special case of the [[general linear model]].<ref>Howell (2002, p 604)</ref> All consider the observations to be the sum of a model (fit) and a residual (error) to be minimized. The [[Kruskal-Wallis test]] and the [[Friedman test]] are [[nonparametric]] tests which do not rely on an assumption of normality.<ref>Howell (2002, Chapter 18: Resampling and nonparametric approaches to data)</ref><ref>Montgomery (2001, Section 3-10: Nonparametric methods in the analysis of variance)</ref> ===Connection to linear regression=== Below we make clear the connection between multi-way ANOVA and linear regression. Linearly re-order the data so that <math>k</math>-th observation is associated with a response <math>y_k</math> and factors <math>Z_{k,b}</math> where <math>b \in \{1,2,\ldots,B\}</math> denotes the different factors and <math>B</math> is the total number of factors. In one-way ANOVA <math>B=1</math> and in two-way ANOVA <math>B = 2</math>. Furthermore, we assume the <math>b</math>-th factor has <math>I_b</math> levels, namely <math>\{1,2,\ldots,I_b\}</math>. Now, we can [[one-hot]] encode the factors into the <math display="inline"> \sum_{b=1}^B I_b</math> dimensional vector <math>v_k</math>. The one-hot encoding function <math>g_b : \{1,2,\ldots,I_b\} \mapsto \{0,1\}^{I_b}</math> is defined such that the <math>i</math>-th entry of <math>g_b(Z_{k,b})</math> is <math display="block">g_b(Z_{k,b})_i = \begin{cases} 1 & \text{if } i=Z_{k,b} \\ 0 & \text{otherwise} \end{cases}</math> The vector <math>v_k</math> is the concatenation of all of the above vectors for all <math>b</math>. Thus, <math>v_k = [g_1(Z_{k,1}), g_2(Z_{k,2}), \ldots, g_B(Z_{k,B})]</math>. In order to obtain a fully general <math>B</math>-way interaction ANOVA we must also concatenate every additional interaction term in the vector <math>v_k</math> and then add an intercept term. Let that vector be <math>X_k</math>. With this notation in place, we now have the exact connection with linear regression. We simply regress response <math>y_k</math> against the vector <math>X_k</math>. However, there is a concern about [[identifiability]]. In order to overcome such issues we assume that the sum of the parameters within each set of interactions is equal to zero. From here, one can use ''F''-statistics or other methods to determine the relevance of the individual factors. ====Example==== We can consider the 2-way interaction example where we assume that the first factor has 2 levels and the second factor has 3 levels. Define <math>a_i = 1</math> if <math>Z_{k,1}=i</math> and <math>b_i = 1</math> if <math>Z_{k,2} = i</math>, i.e. <math>a</math> is the one-hot encoding of the first factor and <math>b</math> is the one-hot encoding of the second factor. With that, <math display="block"> X_k = [a_1, a_2, b_1, b_2, b_3 ,a_1 \times b_1, a_1 \times b_2, a_1 \times b_3, a_2 \times b_1, a_2 \times b_2, a_2 \times b_3, 1] </math> where the last term is an intercept term. For a more concrete example suppose that <math display="block">\begin{align} Z_{k,1} & = 2 \\ Z_{k,2} & = 1 \end{align}</math> Then, <math display="block">X_k = [0,1,1,0,0,0,0,0,1,0,0,1]</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)