Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Box plot
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Examples == === Example without outliers === [[File:No Outlier.png|thumb|Figure 5. The generated boxplot figure of the example on the left with no outliers]] A series of hourly temperatures were measured throughout the day in degrees Fahrenheit. The recorded values are listed in order as follows (°F): 57, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81. A box plot of the data set can be generated by first calculating five relevant values of this data set: minimum, maximum, median ('''''Q''<sub>2</sub>'''), first quartile ('''''Q''<sub>1</sub>'''), and third quartile ('''''Q''<sub>3</sub>'''). The minimum is the smallest number of the data set. In this case, the minimum recorded day temperature is 57°F. The maximum is the largest number of the data set. In this case, the maximum recorded day temperature is 81°F. The median is the "middle" number of the ordered data set. This means that exactly 50% of the elements are below the median and 50% of the elements are greater than the median. The median of this ordered data set is 70°F. The first quartile value ('''''Q''<sub>1</sub>''' '''or 25th percentile)''' is the number that marks one quarter of the ordered data set. In other words, there are exactly 25% of the elements that are less than the first quartile and exactly 75% of the elements that are greater than it. The first quartile value can be easily determined by finding the "middle" number between the minimum and the median. For the hourly temperatures, the "middle" number found between 57°F and 70°F is 66°F. The third quartile value ('''''Q''<sub>3</sub>''' '''or 75th percentile)''' is the number that marks three quarters of the ordered data set. In other words, there are exactly 75% of the elements that are less than the third quartile and 25% of the elements that are greater than it. The third quartile value can be easily obtained by finding the "middle" number between the median and the maximum. For the hourly temperatures, the "middle" number between 70°F and 81°F is 75°F. The interquartile range, or IQR, can be calculated by subtracting the first quartile value ('''''Q''<sub>1</sub>''') from the third quartile value ('''''Q''<sub>3</sub>'''): : <math>\text{IQR} = Q_3 - Q_1=75^\circ F-66^\circ F=9^\circ F.</math> Hence, <math>1.5 \text{IQR}=1.5 \cdot 9^\circ F=13.5 ^\circ F.</math> 1.5 IQR above the third quartile is: : <math>Q_3+1.5\text{ IQR}=75^\circ F+13.5^\circ F=88.5^\circ F.</math> 1.5 IQR below the first quartile is: : <math>Q_1-1.5\text{ IQR}=66^\circ F-13.5^\circ F=52.5^\circ F.</math> The upper whisker boundary of the box-plot is the largest data value that is within 1.5 IQR above the third quartile. Here, 1.5 IQR above the third quartile is 88.5°F and the maximum is 81°F. Therefore, the upper whisker is drawn at the value of the maximum, which is 81°F. Similarly, the lower whisker boundary of the box plot is the smallest data value that is within 1.5 IQR below the first quartile. Here, 1.5 IQR below the first quartile is 52.5°F and the minimum is 57°F. Therefore, the lower whisker is drawn at the value of the minimum, which is 57°F. === Example with outliers === [[File:Boxplot with outlier.png|thumb|Figure 6. The generated boxplot of the example on the left with outliers]] Above is an example without outliers. Here is a follow-up example for generating box-plot with outliers: The ordered set for the recorded temperatures is (°F): 52, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 89. In this example, only the first and the last number are changed. The median, third quartile, and first quartile remain the same. In this case, the maximum value in this data set is 89°F, and 1.5 IQR above the third quartile is 88.5°F. The maximum is greater than 1.5 IQR plus the third quartile, so the maximum is an outlier. Therefore, the upper whisker is drawn at the greatest value smaller than 1.5 IQR above the third quartile, which is 79°F. Similarly, the minimum value in this data set is 52°F, and 1.5 IQR below the first quartile is 52.5°F. The minimum is smaller than 1.5 IQR minus the first quartile, so the minimum is also an outlier. Therefore, the lower whisker is drawn at the smallest value greater than 1.5 IQR below the first quartile, which is 57°F. === In the case of large datasets === An additional example for obtaining box-plot from a data set containing a large number of data points is: ==== General equation to compute empirical quantiles ==== : <math>q_n(p) = x_{(k)} + \alpha(x_{(k+1)} - x_{(k)})</math> : <math>\text{with } k = [p(n+1)] \text{ and } \alpha = p(n+1) - k</math> :Here <math>x_{(k)}</math> stands for the general ordering of the data points (i.e. if <math>i<k</math>, then <math>x_{(i)} < x_{(k)}</math> ) Using the above example that has 24 data points (''n'' = 24), one can calculate the median, first and third quartile either mathematically or visually. '''Median''' : <math> \begin{align} q_n(0.5) & = x_{(12)} + (0.5\cdot25-12)\cdot(x_{(13)}-x_{(12)}) \\[5pt] & = 70+(0.5\cdot25-12)\cdot(70-70) = 70^\circ\text{F} \end{align} </math> '''First quartile''' : <math> \begin{align} q_n(0.25) & = x_{(6)} + (0.25\cdot25-6)\cdot(x_{(7)}-x_{(6)}) \\[5pt] & = 66 +(0.25\cdot25 - 6)\cdot(66-66) = 66^\circ\text{F} \end{align} </math> '''Third quartile''' : <math> \begin{align} q_n(0.75) & = x_{(18)} + (0.75\cdot25-18)\cdot(x_{(19)}-x_{(18)}) \\[5pt] & =75 + (0.75\cdot25-18)\cdot(75-75) = 75^\circ\text{F} \end{align} </math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)