Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Ecological fallacy
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Examples == === Mean and median === An example of ecological fallacy is the assumption that a population mean has a simple interpretation when considering likelihoods for an individual. For instance, if the mean score of a group is larger than zero, this does not imply that a random individual of that group is more likely to have a positive score than a negative one (as long as there are more negative scores than positive scores an individual is more likely to have a negative score). Similarly, if a particular group of people is measured to have a lower mean IQ than the general population, it is an error to conclude that a randomly-selected member of the group is more likely than not to have a lower IQ than the mean IQ of the general population; it is also not necessarily the case that a randomly selected member of the group is more likely than not to have a lower IQ than a randomly-selected member of the general population. Mathematically, this comes from the fact that a distribution can have a positive mean but a negative median. This property is linked to the [[skewness]] of the distribution. Consider the following numerical example: * Group A: 80% of people got 40 points and 20% of them got 95 points. The mean score is 51 points. * Group B: 50% of people got 45 points and 50% got 55 points. The mean score is 50 points. * If we pick two people at random from A and B, there are 4 possible outcomes: ** A β 40, B β 45 (B wins, 40% probability β 0.8 Γ 0.5) ** A β 40, B β 55 (B wins, 40% probability β 0.8 Γ 0.5) ** A β 95, B β 45 (A wins, 10% probability β 0.2 Γ 0.5) ** A β 95, B β 55 (A wins, 10% probability β 0.2 Γ 0.5) * Although Group A has a higher mean score, 80% of the time a random individual of A will score lower than a random individual of B. === Individual and aggregate correlations === Research dating back to [[Γmile Durkheim]] suggests that predominantly [[Protestantism|Protestant]] localities have higher [[suicide]] rates than predominantly [[Catholic Church|Catholic]] localities.<ref>Durkheim, (1951/1897). ''Suicide: A study in sociology''. Translated by John A. Spaulding and George Simpson. New York: The Free Press. {{ISBN|0-684-83632-7}}.</ref> According to Freedman,<ref name = "Freedman">Freedman, D. A. (1999). Ecological Inference and the Ecological Fallacy. ''International Encyclopedia of the Social & Behavioral Sciences'', Technical Report No. 549. https://web.stanford.edu/class/ed260/freedman549.pdf</ref> the idea that Durkheim's findings link, at an individual level, a person's religion to their suicide risk is an example of the ecological fallacy. A group-level relationship does not automatically characterize the relationship at the level of the individual. Similarly, even if at the individual level, [[wealth]] is positively correlated to tendency to vote [[Republican Party (United States)|Republican]] in the [[United States]], we observe that wealthier states tend to vote [[Democratic Party (United States)|Democratic]]. For example, in the [[2004 United States presidential election]], the Republican candidate, [[George W. Bush]], won the fifteen poorest states, and the Democratic candidate, [[John Kerry]], won 9 of the 11 wealthiest states in the [[Electoral College (United States)|Electoral College]]. Yet 62% of voters with annual incomes over $200,000 voted for Bush, but only 36% of voters with annual incomes of $15,000 or less voted for Bush.<ref>{{cite book |last1=Gelman |first1=Andrew |last2=Park |first2=David |last3=Shor |first3=Boris |last4=Bafumi |first4=Joseph |last5=Cortina |first5=Jeronimo |title=Red State, Blue State, Rich State, Poor State |publisher=[[Princeton University Press]] |year=2008 |isbn=978-0-691-13927-2 |author-link1=Andrew Gelman |url-access=registration |url=https://archive.org/details/redstatebluestat00gelm }}</ref> Aggregate-level correlation will differ from individual-level correlation if voting preferences are affected by the total wealth of the state even after controlling for individual wealth. The true driving factor in voting preference could be self-perceived ''relative'' wealth; perhaps those who see themselves as better off than their neighbours are more likely to vote Republican. In this case, an individual would be more likely to vote Republican if they became wealthier, but they would be more likely to vote for a Democrat if their neighbor's wealth increased (resulting in a wealthier state). However, the observed difference in voting habits based on state- and individual-level wealth could also be explained by the common confusion between higher averages and higher likelihoods as discussed above. States may not be wealthier because they contain more wealthy people (i.e., more people with annual incomes over $200,000), but rather because they contain a small number of super-rich individuals; the ecological fallacy then results from incorrectly assuming that individuals in wealthier states are more likely to be wealthy. Many examples of ecological fallacies can be found in studies of social networks, which often combine analysis and implications from different levels. This has been illustrated in an academic paper on networks of farmers in [[Sumatra]].<ref>{{cite journal |first=Petr |last=Matous |year=2015 |title=Social networks and environmental management at multiple levels: soil conservation in Sumatra |journal=Ecology and Society |volume=20 |issue=3 |pages=37 |doi=10.5751/ES-07816-200337|doi-access=free |hdl=10535/9990 |hdl-access=free }}</ref> === Robinson's paradox === A 1950 paper by William S. Robinson computed the illiteracy rate and the proportion of the population born outside the US for each state and for the District of Columbia, as of the [[1930 United States Census|1930 census]].<ref>{{cite journal |last=Robinson |first=W.S. |year=1950 |title=Ecological Correlations and the Behavior of Individuals |journal=[[American Sociological Review]] |volume=15 |issue=3 |pages=351β357 |doi=10.2307/2087176 |jstor=2087176}}</ref> He showed that these two figures were associated with a negative correlation of β0.53; in other words, the greater the proportion of immigrants in a state, the lower its average illiteracy (or, equivalently, the higher its average literacy). However, when individuals are considered, the correlation between illiteracy and nativity was +0.12 (immigrants were on average more illiterate than native citizens). Robinson showed that the negative correlation at the level of state populations was because immigrants tended to settle in states where the native population was more literate. He cautioned against deducing conclusions about individuals on the basis of population-level, or "ecological" data. In 2011, it was found that Robinson's calculations of the ecological correlations are based on the wrong state level data. The correlation of β0.53 mentioned above is in fact β0.46.<ref>The research note on this curious data glitch is published in {{cite journal |first1=Manfred |last1=Te Grotenhuis |first2=Rob |last2=Eisinga |first3=S.V. |last3=Subramanian |title=Robinson's ''Ecological Correlations and the Behavior of Individuals'': methodological corrections |journal=[[International Journal of Epidemiology|Int J Epidemiol]] |year=2011 |volume=40 |issue=4 |pages=1123β1125 |doi=10.1093/ije/dyr081 |pmid=21596762 |doi-access=free |hdl=2066/99678 |hdl-access=free }} The data Robinson used and the corrections are available at [https://archive.today/20130222172038/http://www.ru.nl/mt/rob/downloads/]</ref> Robinson's paper was seminal, but the term 'ecological fallacy' was not coined until 1958 by Selvin.<ref>{{cite journal |first=Hanan C. |last=Selvin |s2cid=143488519 |title=Durkheim's ''Suicide'' and Problems of Empirical Research |journal=[[American Journal of Sociology]] |volume=63 |issue=6 |year=1958 |pages=607β619 |doi=10.1086/222356 }}</ref> === Formal problem === The correlation of aggregate quantities (or [[ecological correlation]]) is not equal to the correlation of individual quantities. Denote by ''X''<sub>''i''</sub>, ''Y''<sub>''i''</sub> two quantities at the individual level. The formula for the covariance of the aggregate quantities in groups of size ''N'' is :<math>\operatorname{cov}\left( \sum_{i=1}^N Y_i, \sum_{i=1}^N X_i\right)= \sum_{i=1}^{N} \operatorname{cov}(Y_{i},X_i)+ \sum_{i=1}^N \sum_{l\neq i} \operatorname{cov}(Y_l,X_i)</math> The covariance of two aggregated variables depends not only on the covariance of two variables within the same individuals but also on covariances of the variables between different individuals. In other words, correlation of aggregate variables take into account cross sectional effects which are not relevant at the individual level. The problem for correlations entails naturally a problem for regressions on aggregate variables: the correlation fallacy is therefore an important issue for a researcher who wants to measure causal impacts. Start with a regression model where the outcome <math>Y_i </math> is impacted by <math>X_i </math> :<math> Y_i=\alpha+\beta X_i+u_i, </math> :<math> \operatorname{cov}[u_i,X_i]=0.</math> The regression model at the aggregate level is obtained by summing the individual equations: :<math> \sum_{i=1}^N Y_i=\alpha\cdot N+ \beta \sum_{i=1}^N X_i+ \sum_{i=1}^N u_i,</math> :<math> \operatorname{cov}\left[\sum_{i=1}^N u_i,\sum_{i=1}^{N} X_i\right]\neq 0.</math> Nothing prevents the regressors and the errors from being correlated at the aggregate level. Therefore, generally, running a regression on aggregate data does not estimate the same model than running a regression with individual data. The aggregate model is correct if and only if :<math> \operatorname{cov}\left[u_i,\sum_{k=1}^{N} X_k\right]= 0 \quad \text{ for all } i. </math> This means that, controlling for <math>X_i </math>, <math>\sum_{k=1}^{N} X_k</math> does not determine <math>Y_i</math>. === Choosing between aggregate and individual inference === There is nothing wrong in running regressions on aggregate data if one is interested in the aggregate model. For instance, for the governor of a state, it is correct to run regressions between police force on crime rate at the state level if one is interested in the policy implication of a rise in police force. However, an ecological fallacy would happen if a city council deduces the impact of an increase in police force in the crime rate at the city level from the correlation at the state level. Choosing to run aggregate or individual regressions to understand aggregate impacts on some policy depends on the following trade-off: aggregate regressions lose individual-level data but individual regressions add strong modeling assumptions. Some researchers suggest that the ecological correlation gives a better picture of the outcome of public policy actions, thus they recommend the ecological correlation over the individual level correlation for this purpose (Lubinski & Humphreys, 1996). Other researchers disagree, especially when the relationships among the levels are not clearly modeled. To prevent ecological fallacy, researchers with no individual data can model first what is occurring at the individual level, then model how the individual and group levels are related, and finally examine whether anything occurring at the group level adds to the understanding of the relationship. For instance, in evaluating the impact of state policies, it is helpful to know that policy impacts vary less among the states than do the policies themselves, suggesting that the policy differences are not well translated into results, despite high ecological correlations (Rose, 1973).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)