Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Stem-and-leaf display
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Format for presentation of quantitative data}} [[File:stemplot_primes.svg|class=skin-invert-image|thumb|upright=0.5|A stem-and-leaf plot of [[prime number]]s under 100 shows that the most frequent tens digits are 0 and 1 while the least is 9]] <!-- [[File:Stem leaf plot 001.png|thumb|A stem-and-leaf plot of the values 20, 30, 32, 35, 41, 41, 43, 47, 48, 51, 53, 53, 54, 56 62, 64, 65, 65, 69, 71, 74, 77, 88 and 102]] --> A '''stem-and-leaf display''' or '''stem-and-leaf plot''' is a device for presenting [[quantitative data]] in a [[information graphics|graphical]] format, similar to a [[histogram]], to assist in visualizing the [[shape]] of a [[probability distribution|distribution]]. They evolved from [[Arthur Bowley]]'s work in the early 1900s, and are useful tools in [[exploratory data analysis]]. Stemplots became more commonly used in the 1980s after the publication of [[John Tukey]]'s book on ''[[exploratory data analysis]]'' in 1977.<ref>{{Cite book | edition = 1 | publisher = Pearson | isbn = 0-201-07616-0 | last = Tukey | first = John W. | title = Exploratory Data Analysis | date = 1977 | url-access = registration | url = https://archive.org/details/exploratorydataa00tuke_0 }}</ref> The popularity during those years is attributable to their use of [[monospaced]] (typewriter) typestyles that allowed computer technology of the time to easily produce the graphics. Modern computers' superior graphic capabilities have meant these techniques are less often used. This plot has been implemented in Octave<ref>[https://octave.sourceforge.io/octave/function/stemleaf.html Function in Octave]</ref> and R.<ref>[https://www.rdocumentation.org/packages/graphics/versions/3.6.1/topics/stem Function in R]</ref> A stem-and-leaf plot is also called a '''stemplot''', but the latter term often refers to another chart type. A simple stem plot may refer to plotting a matrix of ''y'' values onto a common ''x'' axis, and identifying the common'' x'' value with a vertical line, and the individual ''y ''values with symbols on the line.<ref>Examples: [http://www.mathworks.com/help/matlab/ref/stem.html MATLAB's] and [http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.stem Matplotlib's] stem functions. They do ''not'' create a stem-and-leaf display.</ref> Unlike histograms, stem-and-leaf displays retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and [[non-parametric statistics]]. <br /> ==Construction== To construct a stem-and-leaf display, the [[observations]] must first be sorted in ascending order: this can be done most easily if working by hand by constructing a draft of the stem-and-leaf display with the leaves unsorted, then sorting the leaves to produce the final stem-and-leaf display. Here is the sorted set of data values that will be used in the following example: : 44, 46, 47, 49, 63, 64, 66, 68, 68, 72, 72, 75, 76, 81, 84, 88, 106 Next, it must be determined what the stems will represent and what the leaves will represent. Typically, the leaf contains the last digit of the number and the stem contains all of the other digits. In the case of very large numbers, the data values may be rounded to a particular [[place value]] (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stem. In this example, the leaf represents the ones place and the stem will represent the rest of the number (tens place and higher). The stem-and-leaf display is drawn with two columns separated by a vertical line. The stems are listed to the left of the vertical line. It is important that each stem is listed only once and that no numbers are skipped, even if it means that some stems have no leaves. The leaves are listed in increasing order in a row to the right of each stem. When there is a repeated number in the data (such as two 72s), the plot must reflect such (so the plot would look like 7 | 2 2 5 6 7 when it has the numbers 72 72 75 76 77). :<math> \begin{array}{r|l} \text{Stem} & \text{Leaf} \\ \hline 4 & 4~6~7~9 \\ 5 & \\ 6 & 3~4~6~8~8 \\ 7 & 2~2~5~6 \\ 8 & 1~4~8 \\ 9 & \\ 10 & 6 \end{array} </math> :Key: <math>6 \mid 3 = 63</math> :Leaf unit: 1.0 :Stem unit: 10.0 Rounding may be needed to create a stem-and-leaf display. Based on the following set of data, the stem plot below would be created: : −23.678758, −12.45, −3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8 For negative numbers, a negative is placed in front of the stem unit, which is still the value X / 10. Non-integers are rounded. This allows the stem and leaf plot to retain its shape, even for more complicated data sets. As in this example below: :<math> \begin{array}{r|l} \text{Stem} & \text{Leaf} \\ \hline -2 & 4 \\ -1 & 2 \\ -0 & 3 \\ 0 & 4~6~6 \\ 1 & 7 \\ 2 & 5 \\ 3 & \\ 4 & \\ 5 & 7 \end{array} </math> :Key: <math>-2 \mid 4 = -24</math> ==Usage== Stem-and-leaf displays are useful for displaying the relative density and shape of the data, giving the reader a quick overview of the distribution. They retain (most of) the raw numerical data, often with perfect integrity. They are also useful for highlighting [[outlier]]s and finding the [[mode (statistics)|mode]]. However, stem-and-leaf displays are only useful for moderately sized data sets (around 15–150 data points). With very small data sets a stem-and-leaf displays can be of little use, as a reasonable number of data points are required to establish definitive distribution properties. A [[dot plot (statistics)|dot plot]] may be better suited for such data. With very large data sets, a stem-and-leaf display will become very cluttered, since each data point must be represented numerically. A [[box plot]] or [[histogram]] may become more appropriate as the data size increases. ==Non-numerical use== <pre style="line-height:0.9;margin:0;padding:0.5ex;overflow:hidden;display:inline-block;float:right;"> a│abdeghilmnrstwxy b│aeioy c│h d│aeio e│adefhlmnrstwx f|aey g│iou h│aeimo i│dfnost j│ao k│aioy l│aio m│aeimouy n│aeouy o│bdefhikmnoprsuwxy p│aeio q│i r│e s│hiot t│aeio u│ghmnprst v│ w│eo x│iu y│aeou z│aeo</pre> Stem-and-leaf displays can also be used to convey non-numerical information. In this example of valid two-letter words in [[Collins Scrabble Words]] (the word list used in [[Scrabble]] tournaments outside the US) with their [[initial]]s as stems, it can be easily seen that the three most common initials are {{mono|o}}, {{mono|a}} and {{mono|e}}.<ref>Gideon Goldin, [http://gideongoldin.com/stem-leaf-plot-for-twoletter-scrabble-words ''Two-Letter Scrabble Words Visualized as Stem and Leaf Plot''], 2020-10-01</ref> {{Annotated image | image = Stem-and-leaf_time_tables_in_Japanese_train_stations.jpg | image-width = 600 | image-left = -30 | image-top = -66 | width = 280 | height = 200 | float = none | caption = Some railway timetables use stem-and-leaf displays with hours as stems and minutes as leaves }} ==Notes== {{reflist}} ==References== *Wild, C. and Seber, G. (2000) ''Chance Encounters: A First Course in Data Analysis and Inference'' pp. 49–54 John Wiley and Sons. {{ISBN|0-471-32936-3}} *{{Cite book | edition = 2nd | publisher = Polity Press | isbn = 978-0-7456-2282-8 | last = Elliott | first = Jane |author2=Catherine Marsh | title = Exploring Data: An Introduction to Data Analysis for Social Scientists | date = 2008 }} {{Statistics|descriptive}} [[Category:Statistical charts and diagrams]] [[Category:Exploratory data analysis]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Annotated image
(
edit
)
Template:Cite book
(
edit
)
Template:ISBN
(
edit
)
Template:Mono
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Statistics
(
edit
)