Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Quantile
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Quantiles of a population == As in the computation of, for example, [[standard deviation]], the estimation of a quantile depends upon whether one is operating with a [[statistical population]] or with a [[Sample (statistics)|sample]] drawn from it. For a population, of discrete values or for a continuous population density, the {{mvar|k}}-th {{mvar|q}}-quantile is the data value where the cumulative distribution function crosses {{math|''k''/''q''}}. That is, {{mvar|x}} is a {{mvar|k}}-th {{mvar|q}}-quantile for a variable {{mvar|X}} if : {{math|Pr[''X'' < ''x''] ≤ ''k''/''q''}} or, equivalently, {{math|Pr[''X'' ≥ ''x''] ≥ 1 − ''k''/''q''}} and : {{math|Pr[''X'' ≤ ''x''] ≥ ''k''/''q''}} where {{math|Pr}} is the [[Probability distribution#General probability definition|probability function]]. For a finite population of {{mvar|N}} equally probable values indexed {{math|1, …, ''N''}} from lowest to highest, the {{mvar|k}}-th {{mvar|q}}-quantile of this population can equivalently be computed via the value of {{mvar|I<sub>p</sub> {{=}} ''N'' ''k''/''q''}}. If {{mvar|I<sub>p</sub>}} is not an integer, then round up to the next integer to get the appropriate index; the corresponding data value is the {{mvar|k}}-th {{mvar|q}}-quantile. On the other hand, if {{mvar|I<sub>p</sub>}} is an integer then any number from the data value at that index to the data value of the next index can be taken as the quantile, and it is conventional (though arbitrary) to take the average of those two values (see [[#Estimating quantiles from a sample|Estimating quantiles from a sample]]). If, instead of using integers {{mvar|k}} and {{mvar|q}}, the "{{mvar|p}}-quantile" is based on a [[real number]] {{mvar|p}} with {{math|0 < ''p'' < 1}} then {{mvar|p}} replaces {{math|''k''/''q''}} in the above formulas. This broader terminology is used when quantiles are used to [[Quantile-parameterized distribution|parameterize continuous probability distributions]]. Moreover, some software programs (including [[Microsoft Excel]]) regard the minimum and maximum as the 0th and 100th percentile, respectively. However, this broader terminology is an extension beyond traditional statistics definitions. === Examples === The following two examples use the Nearest Rank definition of quantile with rounding. For an explanation of this definition, see [[percentile]]s. ==== Even-sized population ==== Consider an ordered population of 10 data values [3, 6, 7, 8, 8, 10, 13, 15, 16, 20]. What are the 4-quantiles (the "quartiles") of this dataset? {| class="wikitable" |- ! Quartile ! Calculation ! Result |- | Zeroth quartile | Although not universally accepted, one can also speak of the zeroth quartile. This is the minimum value of the set, so the zeroth quartile in this example would be 3. | 3 |- | First quartile | The rank of the first quartile is 10×(1/4) = 2.5, which rounds up to 3, meaning that 3 is the rank in the population (from least to greatest values) at which approximately 1/4 of the values are less than the value of the first quartile. The third value in the population is 7. | 7 |- | Second quartile | The rank of the second quartile (same as the median) is 10×(2/4) = 5, which is an integer, while the number of values (10) is an even number, so the average of both the fifth and sixth values is taken—that is (8+10)/2 = 9, though any value from 8 through to 10 could be taken to be the median. | 9 |- | Third quartile | The rank of the third quartile is 10×(3/4) = 7.5, which rounds up to 8. The eighth value in the population is 15. | 15 |- | Fourth quartile | Although not universally accepted, one can also speak of the fourth quartile. This is the maximum value of the set, so the fourth quartile in this example would be 20. Under the Nearest Rank definition of quantile, the rank of the fourth quartile is the rank of the biggest number, so the rank of the fourth quartile would be 10. | 20 |} So the first, second and third 4-quantiles (the "quartiles") of the dataset [3, 6, 7, 8, 8, 10, 13, 15, 16, 20] are [7, 9, 15]. If also required, the zeroth quartile is 3 and the fourth quartile is 20. ==== Odd-sized population ==== Consider an ordered population of 11 data values [3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20]. What are the 4-quantiles (the "quartiles") of this dataset? {| class="wikitable" |- ! Quartile ! Calculation ! Result |- | Zeroth quartile | Although not universally accepted, one can also speak of the zeroth quartile. This is the minimum value of the set, so the zeroth quartile in this example would be 3. | 3 |- | First quartile | The first quartile is determined by 11×(1/4) = 2.75, which rounds up to 3, meaning that 3 is the rank in the population (from least to greatest values) at which approximately 1/4 of the values are less than the value of the first quartile. The third value in the population is 7. | 7 |- | Second quartile | The second quartile value (same as the median) is determined by 11×(2/4) = 5.5, which rounds up to 6. Therefore, 6 is the rank in the population (from least to greatest values) at which approximately 2/4 of the values are less than the value of the second quartile (or median). The sixth value in the population is 9. | 9 |- | Third quartile | The third quartile value for the original example above is determined by 11×(3/4) = 8.25, which rounds up to 9. The ninth value in the population is 15. | 15 |- | Fourth quartile | Although not universally accepted, one can also speak of the fourth quartile. This is the maximum value of the set, so the fourth quartile in this example would be 20. Under the Nearest Rank definition of quantile, the rank of the fourth quartile is the rank of the biggest number, so the rank of the fourth quartile would be 11. | 20 |} So the first, second and third 4-quantiles (the "quartiles") of the dataset [3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20] are [7, 9, 15]. If also required, the zeroth quartile is 3 and the fourth quartile is 20. === Relationship to the mean === For any population probability distribution on finitely many values, and generally for any probability distribution with a mean and variance, it is the case that <math display="block">\mu - \sigma\cdot\sqrt{\frac{1-p}{p}} \le Q(p) \le \mu + \sigma\cdot\sqrt{\frac{p}{1-p}}\,,</math> where {{mvar|Q(p)}} is the value of the {{mvar|p}}-quantile for {{math|0 < ''p'' < 1}} (or equivalently is the {{mvar|k}}-th {{mvar|q}}-quantile for {{math|1=''p'' = ''k''/''q''}}), where {{mvar|μ}} is the distribution's [[arithmetic mean]], and where {{mvar|σ}} is the distribution's [[standard deviation]].<ref>{{Cite journal |last1=Bagui |first1=S. |last2=Bhaumik |first2=D. |date=2004 |title=Glimpses of inequalities in probability and statistics |journal=International Journal of Statistical Sciences |volume=3 |pages=9–15 |issn=1683-5603 |url=http://www.ru.ac.bd/stat/wp-content/uploads/sites/25/2019/01/P3.V3s.pdf |access-date=2021-08-12 |archive-date=2021-08-12 |archive-url=https://web.archive.org/web/20210812115620/http://www.ru.ac.bd/stat/wp-content/uploads/sites/25/2019/01/P3.V3s.pdf |url-status=dead }}</ref> In particular, the median {{math|1=(''p'' = ''k''/''q'' = 1/2)}} is never more than one standard deviation from the mean. The above formula can be used to bound the value {{math|''μ'' + ''zσ''}} in terms of quantiles. When {{math|''z'' ≥ 0}}, the value that is [[standard score|{{math|''z''}} standard deviations above the mean]] has a lower bound <math display="block">\mu + z \sigma \ge Q\left(\frac{z^2}{1+z^2}\right)\,,\mathrm{~for~} z \ge 0.</math> For example, the value that is {{math|1=''z'' = 1}} standard deviation above the mean is always greater than or equal to {{math|1=''Q''(''p'' = 0.5)}}, the median, and the value that is {{math|1=''z'' = 2}} standard deviations above the mean is always greater than or equal to {{math|1=''Q''(''p'' = 0.8)}}, the fourth quintile. When {{math|''z'' ≤ 0}}, there is instead an upper bound <math display="block">\mu + z \sigma \le Q\left(\frac{1}{1+z^2}\right)\,,\mathrm{~for~} z \le 0.</math> For example, the value {{math|''μ'' + ''zσ''}} for {{math|1=''z'' = −3}} will never exceed {{math|1=''Q''(''p'' = 0.1)}}, the first decile.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)