Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Gini coefficient
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Calculation == [[File:Gini coefficient for distribution with only two income or wealth levels.svg|thumb|upright=1.25|Richest ''u'' of population (red) equally share ''f'' of all income or wealth; others (green) equally share remainder: {{nowrap|''G'' {{=}} ''f'' β ''u''}}. A smooth distribution (blue) with the same ''u'' and ''f'' always has {{nowrap|''G'' > ''f'' β ''u''}}.|right]] While the income distribution of any particular country [[All models are wrong|will not correspond perfectly to the theoretical models]], these models can provide a qualitative explanation of the income distribution in a nation given the Gini coefficient. === Example: Two levels of income === The extreme cases are represented by the most equal possible society in which every person receives the same income ({{nowrap|''G'' {{=}} 0}}), and the most unequal society (with ''N'' individuals) where a single person receives 100% of the total income and the remaining {{nowrap|''N'' β 1}} people receive none ({{nowrap|''G'' {{=}} 1 β 1/''N''}}). A simple case assumes just two levels of income, low and high. If the high income group is a proportion ''u'' of the population and earns a proportion ''f'' of all income, then the Gini coefficient is {{nowrap|''f'' β ''u''}}. A more graded distribution with these same values ''u'' and ''f'' will always have a higher Gini coefficient than {{nowrap|''f'' β ''u''}}. For example, if the wealthiest ''u ='' 20% of the population has ''f ='' 80% of all income (see [[Pareto principle]]), the income Gini coefficient is at least 60%. In another example,<ref>{{cite news |last1=Treanor |first1=Jill |date=2015-10-13 |title=Half of world's wealth now in hands of 1% of population |newspaper=The Guardian |url=https://www.theguardian.com/money/2015/oct/13/half-world-wealth-in-hands-population-inequality-report}}</ref> if ''u ='' 1% of the world's population owns ''f ='' 50% of all wealth, the wealth Gini coefficient is at least 49%. === Alternative expressions === In some cases, this equation can be applied to calculate the Gini coefficient without direct reference to the [[Lorenz curve]]. For example, (taking ''y'' to indicate the income or wealth of a person or household): * For a population of ''n'' individuals with values <math>y_1 \leq y_2\leq \cdots \leq y_n </math>,<ref name="Wolfram Mathworld">{{cite web |title=Gini Coefficient |url=http://mathworld.wolfram.com/GiniCoefficient.html |publisher=Wolfram Mathworld}}</ref> ::<math>G = \frac{1}{n}\left ( n+1 - 2 \left ( \frac{\sum_{i=1}^n (n+1-i)y_i}{\sum_{i=1}^n y_i} \right ) \right ). </math> :This may be simplified to: ::<math>G = \frac{2 \sum_{i=1}^n i y_i}{n \sum_{i=1}^n y_i} -\frac{n+1}{n}.</math> The Gini coefficient can also be considered as half the [[relative mean absolute difference]]. For a random sample ''S'' with values <math>y_1 \leq y_2\leq \cdots \leq y_n </math>, the sample Gini coefficient :<math>G(S) = \frac{1}{n-1}\left (n+1 - 2 \left ( \frac{\sum_{i=1}^n (n+1-i)y_i}{\sum_{i=1}^n y_i}\right ) \right )</math> is a [[consistent estimator]] of the population Gini coefficient, but is not in general [[estimator#Point estimators|unbiased]]. In simplified form: :<math>G(S) = 1 - \frac{2}{n-1}\left ( n - \frac{\sum_{i=1}^n iy_i}{\sum_{i=1}^n y_i}\right ). </math> There does not exist a sample statistic that is always an unbiased estimator of the population Gini coefficient. === Discrete probability distribution === For a [[discrete probability distribution]] with probability mass function <math>f ( y_i ),</math> <math qid=Q120636410>i = 1,\ldots, n</math>, where <math>f ( y_i )</math> is the fraction of the population with income or wealth <math>y_i >0 </math>, the Gini coefficient is: :<math>G = \frac{1}{2\mu} \sum\limits_{i=1}^n \sum\limits_{j=1}^n \, f(y_i) f(y_j)|y_i-y_j|</math> where :<math>\mu=\sum\limits_{i=1}^n y_i f(y_i).</math> If the points with non-zero probabilities are indexed in increasing order <math>(y_i < y_{i+1})</math>, then: :<math>G = 1 - \frac{\sum_{i=1}^n f(y_i)(S_{i-1}+S_i)}{S_n}</math> where :<math>S_i = \sum_{j=1}^i f(y_j)\,y_j\,</math> and <math>S_0 = 0.</math> These formulas are also applicable in the limit, as <math>n\rightarrow\infty.</math> === Continuous probability distribution === When the population is large, the income distribution may be represented by a continuous [[probability density function]] ''f''(''x'') where ''f''(''x'') ''dx'' is the fraction of the population with wealth or income in the interval ''dx'' about ''x''. If ''F''(''x'') is the [[cumulative distribution function]] for ''f''(''x''): :<math>F(x)=\int_0^x f(x)\,dx</math> and ''L''(''x'') is the Lorenz function: :<math>L(x)=\frac{\int_0^x x\,f(x)\,dx}{\int_0^\infty x\,f(x)\,dx}</math> then the [[Lorenz curve]] ''L''(''F'') may then be represented as a function parametric in ''L''(''x'') and ''F''(''x'') and the value of ''B'' can be found by [[integral|integration]]: :<math>B = \int_0^1 L(F) \,dF. </math> The Gini coefficient can also be calculated directly from the [[cumulative distribution function]] of the distribution ''F''(''y''). Defining ''ΞΌ'' as the mean of the distribution, then specifying that ''F''(''y'') is zero for all negative values, the Gini coefficient is given by: :<math>G = 1 - \frac{1}{\mu}\int_0^\infty (1-F(y))^2 \,dy = \frac{1}{\mu}\int_0^\infty F(y)(1-F(y)) \,dy</math> The latter result comes from [[integration by parts]]. ''(Note that this formula can be applied when there are negative values if the integration is taken from minus infinity to plus infinity.)'' The Gini coefficient may be expressed in terms of the [[quantile function]] ''Q''(''F'') ''(inverse of the cumulative distribution function: Q(F(x)) = x)'' : <math>G=\frac{1}{2 \mu}\int_0^1 \int_0^1 |Q(F_1)-Q(F_2)|\,dF_1\,dF_2 .</math> Since the Gini coefficient is [[income inequality metrics|independent of scale]], if the distribution function can be expressed in the form ''f(x,φ,a,b,c...)'' where ''φ'' is a scale factor and ''a, b, c...'' are dimensionless parameters, then the Gini coefficient will be a function only of ''a, b, c...''.<ref name="McDonald1974">{{cite journal |last1=McDonald |first1=James B |last2=Jensen |first2=Bartell C. |date=December 1979 |title=An Analysis of Some Properties of Alternative Measures of Income Inequality Based on the Gamma Distribution Function |url= |journal=Journal of the American Statistical Association |volume=74 |issue=368 |pages=856β860 |doi= 10.1080/01621459.1979.10481042|access-date=}}</ref> For example, for the [[exponential distribution]], which is a function of only ''x'' and a scale parameter, the Gini coefficient is a constant, equal to 1/2. For some functional forms, the Gini index can be calculated explicitly. For example, if ''y'' follows a [[log-normal distribution]] with the standard deviation of logs equal to <math>\sigma</math>, then <math>G = \operatorname{erf}\left(\frac{\sigma }{2 }\right)</math> where <math>\operatorname{erf}</math> is the [[error function]] ( since <math> G=2 \Phi \left(\frac{\sigma }{\sqrt{2}}\right)-1</math>, where <math>\Phi</math> is the cumulative distribution function of a standard normal distribution).<ref name='LNdist'>Crow, E. L., & Shimizu, K. (Eds.). (1988). Lognormal distributions: Theory and applications (Vol. 88). New York: M. Dekker, page 11.</ref> In the table below, some examples for probability density functions with support on <math>[0,\infty)</math> are shown. The Dirac delta distribution represents the case where everyone has the same wealth (or income); it implies no variations between incomes.{{fact|date=January 2025}} :{| class="wikitable" style="float: left; margin-left: 1em;" |- ! Income Distribution function !! PDF(x) !! Gini Coefficient |- | [[Dirac delta function]] || <math>\delta(x-x_0),\, x_0>0</math> || 0 |- | [[Uniform distribution (continuous)|Uniform distribution]]<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Uniform Distribution |url=https://mathworld.wolfram.com/ |access-date=2022-11-30 |website=mathworld.wolfram.com |language=en}}</ref> ||<math>\begin{cases} \frac{1}{b-a} & a\le x\le b \\ 0 & \mathrm{otherwise} \end{cases}</math> || <math>\frac{(b-a)}{3(b+a)}</math> |- | [[Exponential distribution]]<ref>{{Cite web |title=Exponential Distribution {{!}} Definition {{!}} Memoryless Random Variable |url=https://www.probabilitycourse.com/chapter4/4_2_2_exponential.php |access-date=2022-11-30 |website=www.probabilitycourse.com}}</ref> ||<math>\lambda e^{-x\lambda},\,\,x>0</math> ||<math>1/2</math> |- | [[Log-normal distribution]]<ref name='LNdist'/><ref>For the log-normal with <math>\sigma</math> = 0, <math>\textrm{erf}(0)</math> = 0; <math>2 \Phi(0)-1 = 2(0.5)-1</math> = 0.</ref> ||<math>\frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac 12 \left(\frac{\ln\,(x)-\mu}{\sigma}\right)^2}</math> ||<math>\textrm{erf}(\sigma/2)=2 \Phi \left(\frac{\sigma }{\sqrt{2}}\right)-1</math> |- | [[Pareto distribution]]<ref name="mathworld.wolfram.com">{{Cite web |title=Wolfram MathWorld: The Web's Most Extensive Mathematics Resource |url=https://mathworld.wolfram.com/ |access-date=2022-11-30 |website=mathworld.wolfram.com |language=en}}</ref> ||<math>\begin{cases} \frac{\alpha k^\alpha}{x^{\alpha+1}} & x\ge k\\0 & x < k \end{cases}</math> ||<math>\begin{cases} 1 & 0<\alpha < 1\\ \frac{1}{2\alpha -1} & \alpha \ge 1 \end{cases}</math> |- | [[Chi distribution]]<ref name="mathworld.wolfram.com"/> ||<math>f(x;k) = \begin{cases} \dfrac{x^{k-1}e^{-x^2/2}}{2^{k/2-1}\Gamma\left(\frac{k}{2}\right)}, & x\geq 0 \\ 0, & x<0 \end{cases} </math> ||<math>(-1)^k \left| I_{-1}(k,\tfrac{1}{2})\right|</math> |- | [[Chi-squared distribution]]<ref>{{Cite web |title=Chi-Squared Distribution -- from Wolfram MathWorld |url=https://mathworld.wolfram.com/Chi-SquaredDistribution.html |access-date=2023-01-11 |website=mathworld.wolfram.com |language=en}}</ref> ||<math>\frac{2^{-k/2} e^{-x/2} x^{k/2 - 1}}{\Gamma(k/2)}</math> ||<math>\frac{2\,\Gamma\left(\frac{1+k}{2}\right)}{k\,\Gamma(k/2)\sqrt{\pi}}</math> |- | [[Gamma distribution]]<ref name=McDonald1974/> ||<math>\frac{e^{-x/\theta}x^{k-1}\theta^{-k}}{\Gamma(k)}</math> ||<math>\frac{\Gamma\left(\frac{2k+1}{2}\right)}{k\,\Gamma(k)\sqrt{\pi}}</math> |- | [[Weibull distribution]]<ref>{{Cite web |title=Weibull Distribution: Characteristics of the Weibull Distribution |url=https://www.weibull.com/hotwire/issue14/relbasics14.htm |access-date=2022-11-30 |website=www.weibull.com}}</ref> ||<math>\frac {k} {\lambda}\, \left(\frac {x}{\lambda} \right)^{k-1} e^{-(x/\lambda)^k}</math> ||<math>1-2^{-1/k}</math> |- | [[Beta distribution]]<ref>{{Cite web |last=Weisstein |first=Eric W. |title=Beta Distribution |url=https://mathworld.wolfram.com/ |access-date=2022-11-30 |website=mathworld.wolfram.com |language=en}}</ref> ||<math>\frac {x^{\alpha-1}(1-x)^{\beta-1}} {B(\alpha,\beta)}</math> ||<math>\left(\frac{2}{\alpha}\right)\frac{B(\alpha+\beta,\alpha+\beta)}{B(\alpha,\alpha)B(\beta,\beta)}</math> |- |[[Log-logistic distribution]]<ref>{{Cite web |title=The Log-Logistic Distribution |url=https://www.randomservices.org/random/special/LogLogistic.html |access-date=2022-11-30 |website=www.randomservices.org}}</ref> |<math>\frac{ (\beta/\alpha)(x/\alpha)^{\beta-1} } { \left (1+(x/\alpha)^{\beta} \right)^2 }</math> |<math>1/\beta</math> |} {{clear}} * <math>\Gamma(\,)</math> is the [[Gamma function]] * <math>B(\,)</math> is the [[Beta function]] * <math>I_k(\,)</math> is the [[Beta function|Regularized incomplete beta function]] === Other approaches === Sometimes the entire Lorenz curve is not known, and only values at certain intervals are given. In that case, the Gini coefficient can be approximated using various techniques for [[interpolation|interpolating]] the missing values of the Lorenz curve. If (''X''<sub>''k''</sub>, ''Y''<sub>''k''</sub>) are the known points on the Lorenz curve, with the ''X''<sub>''k''</sub> indexed in increasing order (''X''<sub>''k'' β 1</sub> < ''X''<sub>''k''</sub>), so that: * ''X''<sub>''k''</sub> is the cumulated proportion of the population variable, for ''k'' = 0,...,''n'', with ''X''<sub>0</sub> = 0, ''X''<sub>''n''</sub> = 1. * ''Y''<sub>''k''</sub> is the cumulated proportion of the income variable, for ''k'' = 0,...,''n'', with ''Y''<sub>0</sub> = 0, ''Y''<sub>''n''</sub> = 1. * ''Y''<sub>''k''</sub> should be indexed in non-decreasing order (''Y''<sub>''k''</sub> > ''Y''<sub>''k'' β 1</sub>) If the Lorenz curve is approximated on each interval as a line between consecutive points, then the area B can be approximated with [[Trapezoidal rule|trapezoids]] and: :<math>G_1 = 1 - \sum_{k=1}^{n} (X_{k} - X_{k-1}) (Y_{k} + Y_{k-1})</math> is the resulting approximation for G. More accurate results can be obtained using other methods to [[Numerical integration|approximate the area]] B, such as approximating the Lorenz curve with a [[Simpson's rule|quadratic function]] across pairs of intervals or building an appropriately smooth approximation to the underlying distribution function that matches the known data. If the population mean and boundary values for each interval are also known, these can also often be used to improve the accuracy of the approximation. The Gini coefficient calculated from a sample is a statistic, and its standard error, or confidence intervals for the population Gini coefficient, should be reported. These can be calculated using [[Resampling (statistics)#Bootstrap|bootstrap]] techniques, mathematically complicated and computationally demanding even in an era of fast computers.<ref>{{Cite web |last=Abdon |first=Mitch |date=2011-05-23 |title=Bootstrapping Gini |url=https://www.statadaily.com/bootstrapping-gini/ |access-date=2022-11-12 |website=Statadaily: Unsolicited advice for the interested |language=en-US}}</ref> Economist [[Tomson Ogwang]] made the process more efficient by setting up a "trick regression model" in which respective income variables in the sample are ranked, with the lowest income being allocated rank 1. The model then expresses the rank (dependent variable) as the sum of a constant ''A'' and a [[normal distribution|normal]] error term whose variance is inversely proportional to ''y''<sub>''k''</sub>: :<math>k = A + \ N(0, s^{2}/y_k) </math> Thus, ''G'' can be expressed as a function of the weighted [[Least-squares estimation|least squares estimate]] of the constant ''A'' and that this can be used to speed up the calculation of the [[Resampling (statistics)#Jackknife|jackknife]] estimate for the standard error. Economist David Giles argued that the [[standard error]] of the estimate of ''A'' can be used to derive the estimate of ''G'' directly without using a jackknife. This method only requires using ordinary least squares regression after ordering the sample data. The results compare favorably with the estimates from the [[Jackknife resampling|jackknife]] with agreement improving with increasing sample size.{{sfnp|Giles|2004}} However, it has been argued that this depends on the model's assumptions about the error distributions and the independence of error terms. These assumptions are often not valid for real data sets. There is still ongoing debate surrounding this topic. [[Guillermina Jasso]]<ref>{{cite journal|last=Jasso|first=Guillermina|year=1979|title=On Gini's Mean Difference and Gini's Index of Concentration|journal=American Sociological Review|volume=44|issue=5|pages=867β870|jstor=2094535|doi=10.2307/2094535}}</ref> and [[Angus Deaton]]{{sfnp|Deaton|1997|p=139}} independently proposed the following formula for the Gini coefficient: :<math>G = \frac{N+1}{N-1}-\frac{2}{N(N-1)\mu}(\sum_{i=1}^n P_iX_i)</math> where <math>\mu</math> is mean income of the population, P<sub>i</sub> is the income rank P of person i, with income X, such that the richest person receives a rank of 1 and the poorest a rank of ''N''. This effectively gives higher weight to poorer people in the income distribution, which allows the Gini to meet the [[Income inequality metrics#Transfer principle|Transfer Principle]]. Note that the Jasso-Deaton formula rescales the coefficient so that its value is one if all the <math>X_i</math> are zero except one. Note however Allison's reply on the need to divide by NΒ² instead.<ref>{{cite journal|title=Reply to Jasso|first=Paul D.|last=Allison|journal=American Sociological Review|volume=44|issue=5|year=1979|pages=870β872|jstor=2094536|doi=10.2307/2094536}}<!--|access-date=2 February 2015--></ref> [[FAO]] explains another version of the formula.<ref name="fao gini">{{cite web|title=Inequality Analysis β The Gini Index|publisher=Food and Agriculture Organization, United Nations|first1=Lorenzo Giovanni|last1=BellΓΉ|first2=Paolo|last2=Liberati|year=2006|url=http://www.fao.org/docs/up/easypol/329/gini_index_040EN.pdf|access-date=31 July 2012|archive-date=13 July 2017|archive-url=https://web.archive.org/web/20170713164057/http://www.fao.org/docs/up/easypol/329/gini_index_040en.pdf|url-status=dead}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)