Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Box plot
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Elements== [[File:Box-Plot mit Min-Max Abstand.png|thumb|Figure 2. Box-plot with whiskers from minimum to maximum]] [[File:Box-Plot mit Interquartilsabstand.png|thumb|Figure 3. Same box-plot with whiskers drawn within the 1.5 IQR value]] A boxplot is a standardized way of displaying the dataset based on the [[five-number summary]]: the minimum, the maximum, the sample median, and the first and third quartiles. * '''[[Sample minimum|Minimum]] (''Q''<sub>0</sub> or 0th [[percentile]])''': the lowest data point in the data set excluding any outliers * '''[[Sample maximum|Maximum]] (''Q''<sub>4</sub> or 100th percentile)''': the highest data point in the data set excluding any outliers * '''[[Median]] (''Q''<sub>2</sub> or 50th percentile)''': the middle value in the data set * '''[[First quartile]] (''Q''<sub>1</sub> or 25th percentile)''': also known as the ''lower quartile'' ''q''<sub>''n''</sub>(0.25), it is the median of the lower half of the dataset. * '''[[Third quartile]] (''Q''<sub>3</sub> or 75th percentile)''': also known as the ''upper quartile'' ''q''<sub>''n''</sub>(0.75), it is the median of the upper half of the dataset.<ref>{{cite journal |last1=Holmes |first1=Alexander |last2=Illowsky |first2=Barbara |last3=Dean |first3=Susan |title=Introductory Business Statistics |website=OpenStax |date=31 March 2015 |url=https://opentextbc.ca/introbusinessstatopenstax/chapter/measures-of-the-location-of-the-data/ |access-date=29 April 2020 |archive-date=27 July 2020 |archive-url=https://web.archive.org/web/20200727025431/https://opentextbc.ca/introbusinessstatopenstax/chapter/measures-of-the-location-of-the-data/ |url-status=dead }}</ref> In addition to the minimum and maximum values used to construct a box-plot, another important element that can also be employed to obtain a box-plot is the interquartile range (IQR), as denoted below: * '''[[Interquartile range]] (IQR)''': the distance between the upper and lower quartiles :: <math>\text{IQR} = Q_3 - Q_1 = q_n(0.75) - q_n(0.25)</math> A box-plot usually includes two parts, a box and a set of whiskers as shown in Figure 2. ===Box=== The box is drawn from ''Q''<sub>1</sub> to ''Q''<sub>3</sub> with a horizontal line drawn inside it to denote the median. Some box plots include an additional character to represent the mean of the data.<ref name="frigge hoaglin iglewicz2">{{Cite journal|last1=Frigge|first1=Michael|last2=Hoaglin|first2=David C.|last3=Iglewicz|first3=Boris|date=February 1989|title=Some Implementations of the Boxplot|journal=[[The American Statistician]]|volume=43|issue=1|pages=50β54|doi=10.2307/2685173|jstor=2685173}}</ref><ref>{{cite journal|last1=Marmolejo-Ramos|first1=F.|last2=Tian|first2=S.|date=2010|title=The shifting boxplot. A boxplot based on essential summary statistics around the mean|journal=International Journal of Psychological Research|volume=3|issue=1|pages=37β46|doi=10.21500/20112084.823|doi-access=free|hdl=10819/6492|hdl-access=free}}</ref> ===Whiskers=== The whiskers must end at an observed data point, but can be defined in various ways. In the most straightforward method, the boundary of the lower whisker is the minimum value of the data set, and the boundary of the upper whisker is the maximum value of the data set. Because of this variability, it is appropriate to describe the convention that is being used for the whiskers and outliers in the caption of the box-plot. Another popular choice for the boundaries of the whiskers is based on the 1.5 IQR value. From above the upper quartile ('''''Q''<sub>3</sub>'''), a distance of 1.5 times the IQR is measured out and a whisker is drawn ''up to'' the largest observed data point from the dataset that falls within this distance. Similarly, a distance of 1.5 times the IQR is measured out below the lower quartile ('''''Q''<sub>1</sub>''') and a whisker is drawn ''down to'' the lowest observed data point from the dataset that falls within this distance. Because the whiskers must end at an observed data point, the whisker lengths can look unequal, even though 1.5 IQR is the same for both sides. All other observed data points outside the boundary of the whiskers are plotted as '''outliers'''.<ref>{{Cite book |title=A Modern Introduction to Probability and Statistics |url=https://archive.org/details/modernintroducti00dekk_722 |url-access=limited |last=Dekking |first=F.M. |publisher=Springer |year=2005 |isbn=1-85233-896-2 |pages=[https://archive.org/details/modernintroducti00dekk_722/page/n240 234]β238 }}</ref> The outliers can be plotted on the box-plot as a dot, a small circle, a star, ''etc.'' (see example below). [[File:Box Plot Picture.png|thumb|389x389px|This is a picture of a box plot representing data]] There are other representations in which the whiskers can stand for several other things, such as: * One [[standard deviation]] above and below the mean of the data set * The 9th percentile and the 91st percentile of the data set * The 2nd percentile and the 98th percentile of the data set Rarely, box-plot can be plotted without the whiskers. This can be appropriate for sensitive information to avoid whiskers (and outliers) disclosing actual values observed.<ref name="DGRW">{{Cite book|last1=Derrick|first1=Ben|last2=Green|first2=Elizabeth|last3=Ritchie|first3=Felix|last4=White|first4=Paul|date=September 2022|chapter=The Risk of Disclosure When Reporting Commonly Used Univariate Statistics|title=Privacy in Statistical Databases|series=Lecture Notes in Computer Science |volume=13463|pages=119β129|doi=10.1007/978-3-031-13945-1_9|isbn=978-3-031-13944-4 }}</ref> The unusual percentiles 2%, 9%, 91%, 98% are sometimes used for whisker cross-hatches and whisker ends to depict the [[seven-number summary]]. If the data are [[Normal distribution|normally distributed]], the locations of the seven marks on the box plot will be equally spaced. On some box plots, a cross-hatch is placed before the end of each whisker.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)