Editing Five-number summary

{{Short description|Set of descriptive statistics}}{{Refimprove|date=January 2013}}

The '''five-number summary''' is a set of [[descriptive statistics]] that provides information about a dataset. It consists of the five most important sample [[percentile]]s:
# the [[sample minimum]] ''(smallest observation)''
# the [[quartile|lower quartile]] or ''first quartile''
# the [[median]] (the middle value)
# the [[quartile|upper quartile]] or ''third quartile''
# the [[sample maximum]] (largest observation)

In addition to the median of a single set of data there are two related statistics called the upper and lower quartiles. If data are placed in order, then the lower quartile is central to the lower half of the data and the upper quartile is central to the upper half of the data. These quartiles are used to calculate the interquartile range, which helps to describe the spread of the data, and determine whether or not any data points are outliers.

In order for these statistics to exist, the observations must be from a [[univariate]] variable that can be measured on an ordinal, interval or ratio [[Level of measurement|scale]].

==Use and representation==
The five-number summary provides a concise summary of the [[Probability distribution|distribution]] of the observations. Reporting five numbers avoids the need to decide on the most appropriate summary statistic. The five-number summary gives information about the location (from the median), spread (from the quartiles) and range (from the sample minimum and maximum) of the observations. Since it reports [[order statistic]]s (rather than, say, the mean) the five-number summary is appropriate for [[Level of measurement#Ordinal scale|ordinal measurements]], as well as interval and ratio measurements.

It is possible to quickly compare several sets of observations by comparing their five-number summaries, which can be represented graphically using a [[boxplot]].

In addition to the points themselves, many [[L-estimator]]s can be computed from the five-number summary, including [[interquartile range]], [[midhinge]], [[range (statistics)|range]], [[mid-range]], and [[trimean]].

The five-number summary is sometimes represented as in the following table: 
{| class="wikitable" style="margin-right:auto; margin-left:auto;border:none;"
| colspan=2 style="text-align:center;" | median
|-
|1st quartile || 3rd quartile
|-
|Minimum || Maximum
|}

==Example==
This example calculates the five-number summary for the following set of observations: 0, 0, 1, 2, 63, 61, 27, 13.
These are the number of moons of each planet in the [[Solar System]].

It helps to put the observations in ascending order: 0, 0, 1, 2, 13, 27, 61, 63. There are eight observations, so the median is the mean of the two middle numbers, (2 + 13)/2 = 7.5. Splitting the observations either side of the median gives two groups of four observations. The median of the first group is the lower or first quartile, and is equal to (0 + 1)/2 = 0.5. The median of the second group is the upper or third quartile, and is equal to (27 + 61)/2 = 44.
The smallest and largest observations are 0 and 63.

So the five-number summary would be 0, 0.5, 7.5, 44, 63.

===Example in R===
It is possible to calculate the five-number summary in the [[R programming language]] using the <code>fivenum</code> function. The <code>summary</code> function, when applied to a vector, displays the five-number summary together with the mean (which is not itself a part of the five-number summary). The <code>fivenum</code> uses a different method to calculate percentiles than the <code>summary</code> function.
{{sxhl|2=rout|
> moons <- c(0, 0, 1, 2, 63, 61, 27, 13)
> fivenum(moons)
[1]  0.0  0.5  7.5 44.0 63.0
> summary(moons)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.75    7.50   20.88   35.50   63.00 
}}

===Example in Python===
This python example uses the <code>percentile</code> function from the numerical library <code>numpy</code> and works in Python 2 and 3. 
<syntaxhighlight lang="numpy">
import numpy as np

def fivenum(data):
    """Five-number summary."""
    return np.percentile(data, [0, 25, 50, 75, 100], method="midpoint")
</syntaxhighlight>
<syntaxhighlight lang="pycon">
>>> moons = [0, 0, 1, 2, 63, 61, 27, 13]
>>> print(fivenum(moons))
[  0.    0.5   7.5  44.   63. ]
</syntaxhighlight>

=== Example in SAS ===
You can use <code>PROC UNIVARIATE</code> in [[SAS (software)|SAS]] to get the five number summary:
<syntaxhighlight lang="sas">
data fivenum;
input x @@;
datalines;
1 2 3 4 20 202 392 4 38 20
;
run;

ods select Quantiles;
proc univariate data = fivenum;
 output out = fivenums min = min Q1 = Q1 Q2 = median Q3 = Q3 max = max;
run;

proc print data = fivenums;
run;
</syntaxhighlight>

=== Example in Stata ===
[[File:Five number summary.png|thumb|336px|A five number summary of a distribution of data.]]
<syntaxhighlight lang="stata">
input byte y
0 
0 
1 
2 
63 
61 
27 
13
end 
list

tabstat y, statistics (min q max)
</syntaxhighlight>

==See also==
* [[Seven-number summary]]
* [[Three-point estimation]]
* [[Box plot]]

==References==
{{refbegin|2}}
*{{cite book | date = 1982-12-21 | editor1-last = Hoaglin | editor1-first = David C. | editor2-last = Mosteller | editor2-first = Frederick | editor2-link = Frederick Mosteller | editor3-last = Tukey | editor3-first = John W. | editor3-link = John Tukey | title = Understanding Robust and Exploratory Data Analysis | url = https://archive.org/details/understandingrob0000unse | series = Wiley Series in Probability and Statistics | language = en | edition = 1st | publisher = [[Wiley (publisher)|Wiley]] | isbn = 978-0471097778 | lccn = 82008528 | oclc = 473252998 | ol = OL3488838M | via = [[Internet Archive]] | df = dmy-all}}
*{{cite book | last1 = Greenwood | first1 = David | last2 = Woolley | first2 = Sara | last3 = Goodman | first3 = Jenny | last4 = Vaughan | first4 = Jennifer | last5 = Palmer | first5 = Stuart | date = 2019-11-08 | chapter = Chapter 9: Statistics | title = Essential Mathematics for the Australian Curriculum Year 10 |  language = en | edition = 3rd | publisher = [[Cambridge University Press & Assessment|Cambridge University Press]] | isbn = 978-1108773461 | lccn = | oclc = 1231440374 | ol = OL33037157M | df = dmy-all}}
{{refend}}

[[Category:Summary statistics]]
[[Category:Articles with example Python (programming language) code]]
[[Category:Articles with example R code]]