Editing Gini coefficient (section)

=== Other approaches ===

Sometimes the entire Lorenz curve is not known, and only values at certain intervals are given. In that case, the Gini coefficient can be approximated using various techniques for [[interpolation|interpolating]] the missing values of the Lorenz curve. If (''X''<sub>''k''</sub>, ''Y''<sub>''k''</sub>) are the known points on the Lorenz curve, with the ''X''<sub>''k''</sub> indexed in increasing order (''X''<sub>''k'' – 1</sub> &lt; ''X''<sub>''k''</sub>), so that:
* ''X''<sub>''k''</sub> is the cumulated proportion of the population variable, for ''k'' = 0,...,''n'', with ''X''<sub>0</sub> = 0, ''X''<sub>''n''</sub> = 1.
* ''Y''<sub>''k''</sub> is the cumulated proportion of the income variable, for ''k'' = 0,...,''n'', with ''Y''<sub>0</sub> = 0, ''Y''<sub>''n''</sub> = 1.
* ''Y''<sub>''k''</sub> should be indexed in non-decreasing order (''Y''<sub>''k''</sub> > ''Y''<sub>''k'' – 1</sub>)
If the Lorenz curve is approximated on each interval as a line between consecutive points, then the area B can be approximated with [[Trapezoidal rule|trapezoids]] and:
:<math>G_1 = 1 - \sum_{k=1}^{n} (X_{k} - X_{k-1}) (Y_{k} + Y_{k-1})</math>

is the resulting approximation for G. More accurate results can be obtained using other methods to [[Numerical integration|approximate the area]] B, such as approximating the Lorenz curve with a [[Simpson's rule|quadratic function]] across pairs of intervals or building an appropriately smooth approximation to the underlying distribution function that matches the known data. If the population mean and boundary values for each interval are also known, these can also often be used to improve the accuracy of the approximation.

The Gini coefficient calculated from a sample is a statistic, and its standard error, or confidence intervals for the population Gini coefficient, should be reported. These can be calculated using [[Resampling (statistics)#Bootstrap|bootstrap]] techniques, mathematically complicated and computationally demanding even in an era of fast computers.<ref>{{Cite web |last=Abdon |first=Mitch |date=2011-05-23 |title=Bootstrapping Gini |url=https://www.statadaily.com/bootstrapping-gini/ |access-date=2022-11-12 |website=Statadaily: Unsolicited advice for the interested |language=en-US}}</ref> Economist [[Tomson Ogwang]] made the process more efficient by setting up a "trick regression model" in which respective income variables in the sample are ranked, with the lowest income being allocated rank 1. The model then expresses the rank (dependent variable) as the sum of a constant ''A'' and a [[normal distribution|normal]] error term whose variance is inversely proportional to ''y''<sub>''k''</sub>:

:<math>k = A + \ N(0, s^{2}/y_k) </math>

Thus, ''G'' can be expressed as a function of the weighted [[Least-squares estimation|least squares estimate]] of the constant ''A'' and that this can be used to speed up the calculation of the [[Resampling (statistics)#Jackknife|jackknife]] estimate for the standard error. Economist David Giles argued that the [[standard error]] of the estimate of ''A'' can be used to derive the estimate of ''G'' directly without using a jackknife. This method only requires using ordinary least squares regression after ordering the sample data. The results compare favorably with the estimates from the [[Jackknife resampling|jackknife]] with agreement improving with increasing sample size.{{sfnp|Giles|2004}}

However, it has been argued that this depends on the model's assumptions about the error distributions and the independence of error terms. These assumptions are often not valid for real data sets. There is still ongoing debate surrounding this topic.

[[Guillermina Jasso]]<ref>{{cite journal|last=Jasso|first=Guillermina|year=1979|title=On Gini's Mean Difference and Gini's Index of Concentration|journal=American Sociological Review|volume=44|issue=5|pages=867–870|jstor=2094535|doi=10.2307/2094535}}</ref> and [[Angus Deaton]]{{sfnp|Deaton|1997|p=139}} independently proposed the following formula for the Gini coefficient:

:<math>G = \frac{N+1}{N-1}-\frac{2}{N(N-1)\mu}(\sum_{i=1}^n P_iX_i)</math>

where <math>\mu</math> is mean income of the population, P<sub>i</sub> is the income rank P of person i, with income X, such that the richest person receives a rank of 1 and the poorest a rank of ''N''. This effectively gives higher weight to poorer people in the income distribution, which allows the Gini to meet the [[Income inequality metrics#Transfer principle|Transfer Principle]]. Note that the Jasso-Deaton formula rescales the coefficient so that its value is one if all the <math>X_i</math> are zero except one. Note however Allison's reply on the need to divide by N² instead.<ref>{{cite journal|title=Reply to Jasso|first=Paul D.|last=Allison|journal=American Sociological Review|volume=44|issue=5|year=1979|pages=870–872|jstor=2094536|doi=10.2307/2094536}}<!--|access-date=2 February 2015--></ref>

[[FAO]] explains another version of the formula.<ref name="fao gini">{{cite web|title=Inequality Analysis – The Gini Index|publisher=Food and Agriculture Organization, United Nations|first1=Lorenzo Giovanni|last1=Bellù|first2=Paolo|last2=Liberati|year=2006|url=http://www.fao.org/docs/up/easypol/329/gini_index_040EN.pdf|access-date=31 July 2012|archive-date=13 July 2017|archive-url=https://web.archive.org/web/20170713164057/http://www.fao.org/docs/up/easypol/329/gini_index_040en.pdf|url-status=dead}}</ref>