Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Analysis of variance
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Connection to linear regression=== Below we make clear the connection between multi-way ANOVA and linear regression. Linearly re-order the data so that <math>k</math>-th observation is associated with a response <math>y_k</math> and factors <math>Z_{k,b}</math> where <math>b \in \{1,2,\ldots,B\}</math> denotes the different factors and <math>B</math> is the total number of factors. In one-way ANOVA <math>B=1</math> and in two-way ANOVA <math>B = 2</math>. Furthermore, we assume the <math>b</math>-th factor has <math>I_b</math> levels, namely <math>\{1,2,\ldots,I_b\}</math>. Now, we can [[one-hot]] encode the factors into the <math display="inline"> \sum_{b=1}^B I_b</math> dimensional vector <math>v_k</math>. The one-hot encoding function <math>g_b : \{1,2,\ldots,I_b\} \mapsto \{0,1\}^{I_b}</math> is defined such that the <math>i</math>-th entry of <math>g_b(Z_{k,b})</math> is <math display="block">g_b(Z_{k,b})_i = \begin{cases} 1 & \text{if } i=Z_{k,b} \\ 0 & \text{otherwise} \end{cases}</math> The vector <math>v_k</math> is the concatenation of all of the above vectors for all <math>b</math>. Thus, <math>v_k = [g_1(Z_{k,1}), g_2(Z_{k,2}), \ldots, g_B(Z_{k,B})]</math>. In order to obtain a fully general <math>B</math>-way interaction ANOVA we must also concatenate every additional interaction term in the vector <math>v_k</math> and then add an intercept term. Let that vector be <math>X_k</math>. With this notation in place, we now have the exact connection with linear regression. We simply regress response <math>y_k</math> against the vector <math>X_k</math>. However, there is a concern about [[identifiability]]. In order to overcome such issues we assume that the sum of the parameters within each set of interactions is equal to zero. From here, one can use ''F''-statistics or other methods to determine the relevance of the individual factors. ====Example==== We can consider the 2-way interaction example where we assume that the first factor has 2 levels and the second factor has 3 levels. Define <math>a_i = 1</math> if <math>Z_{k,1}=i</math> and <math>b_i = 1</math> if <math>Z_{k,2} = i</math>, i.e. <math>a</math> is the one-hot encoding of the first factor and <math>b</math> is the one-hot encoding of the second factor. With that, <math display="block"> X_k = [a_1, a_2, b_1, b_2, b_3 ,a_1 \times b_1, a_1 \times b_2, a_1 \times b_3, a_2 \times b_1, a_2 \times b_2, a_2 \times b_3, 1] </math> where the last term is an intercept term. For a more concrete example suppose that <math display="block">\begin{align} Z_{k,1} & = 2 \\ Z_{k,2} & = 1 \end{align}</math> Then, <math display="block">X_k = [0,1,1,0,0,0,0,0,1,0,0,1]</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)