Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Ecological fallacy
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Formal problem === The correlation of aggregate quantities (or [[ecological correlation]]) is not equal to the correlation of individual quantities. Denote by ''X''<sub>''i''</sub>, ''Y''<sub>''i''</sub> two quantities at the individual level. The formula for the covariance of the aggregate quantities in groups of size ''N'' is :<math>\operatorname{cov}\left( \sum_{i=1}^N Y_i, \sum_{i=1}^N X_i\right)= \sum_{i=1}^{N} \operatorname{cov}(Y_{i},X_i)+ \sum_{i=1}^N \sum_{l\neq i} \operatorname{cov}(Y_l,X_i)</math> The covariance of two aggregated variables depends not only on the covariance of two variables within the same individuals but also on covariances of the variables between different individuals. In other words, correlation of aggregate variables take into account cross sectional effects which are not relevant at the individual level. The problem for correlations entails naturally a problem for regressions on aggregate variables: the correlation fallacy is therefore an important issue for a researcher who wants to measure causal impacts. Start with a regression model where the outcome <math>Y_i </math> is impacted by <math>X_i </math> :<math> Y_i=\alpha+\beta X_i+u_i, </math> :<math> \operatorname{cov}[u_i,X_i]=0.</math> The regression model at the aggregate level is obtained by summing the individual equations: :<math> \sum_{i=1}^N Y_i=\alpha\cdot N+ \beta \sum_{i=1}^N X_i+ \sum_{i=1}^N u_i,</math> :<math> \operatorname{cov}\left[\sum_{i=1}^N u_i,\sum_{i=1}^{N} X_i\right]\neq 0.</math> Nothing prevents the regressors and the errors from being correlated at the aggregate level. Therefore, generally, running a regression on aggregate data does not estimate the same model than running a regression with individual data. The aggregate model is correct if and only if :<math> \operatorname{cov}\left[u_i,\sum_{k=1}^{N} X_k\right]= 0 \quad \text{ for all } i. </math> This means that, controlling for <math>X_i </math>, <math>\sum_{k=1}^{N} X_k</math> does not determine <math>Y_i</math>.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)