Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Exploratory data analysis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Overview== Tukey defined data analysis in 1961 as: "Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data."<ref>[http://projecteuclid.org/download/pdf_1/euclid.aoms/1177704711 John Tukey-The Future of Data Analysis-July 1961]</ref> Exploratory data analysis is a technique to analyze and investigate a dataset and summarize its main characteristics. A main advantage of EDA is providing the visualization of data after conducting analysis. Tukey's championing of EDA encouraged the development of [[Computational statistics|statistical computing]] packages, especially [[S (programming language)|S]] at [[Bell Labs]].<ref>{{Citation |last=Becker |first=Richard A. |title=A Brief History of S |publisher=AT&T Bell Laboratories |place=Murray Hill, New Jersey |access-date=2015-07-23 |url=http://www2.research.att.com/areas/stat/doc/94.11.ps |format=PS |archive-url=https://web.archive.org/web/20150723044213/http://www2.research.att.com/areas/stat/doc/94.11.ps |archive-date=2015-07-23 |quotation="... we wanted to be able to interact with our data, using Exploratory Data Analysis (Tukey, 1971) techniques."}}</ref> The S programming language inspired the systems [[S-PLUS]] and [[R (programming language)|R]]. This family of statistical-computing environments featured vastly improved dynamic visualization capabilities, which allowed statisticians to identify [[outlier]]s, [[trend estimation|trends]] and [[pattern recognition|patterns]] in data that merited further study. Tukey's EDA was related to two other developments in [[statistical theory]]: [[robust statistics]] and [[nonparametric statistics]], both of which tried to reduce the sensitivity of statistical inferences to errors in formulating [[statistical model]]s. Tukey promoted the use of [[five number summary]] of numerical data—the two [[extreme value|extreme]]s ([[maximum]] and [[minimum]]), the [[median]], and the [[quartile]]s—because these median and quartiles, being functions of the [[empirical distribution function|empirical distribution]] <!-- [[statistical functional]]s (and the related [[interquartile range]] and [[range]]) -->are defined for all distributions, unlike the [[mean value|mean]] and [[standard deviation]]. Moreover, the quartiles and median are more robust to [[skewness|skewed]] or [[heavy-tailed distribution]]s than traditional summaries (the mean and standard deviation). The packages [[S (programming language)|S]], [[S-PLUS]], and [[R (programming language)|R]] included routines using [[resampling (statistics)|resampling statistics]], such as Quenouille and Tukey's [[resampling (statistics)#Jackknife|jackknife]] and [[Bradley Efron|Efron]]{{'s}} [[bootstrapping (statistics)|bootstrap]], which are nonparametric and robust (for many problems). Exploratory data analysis, robust statistics, nonparametric statistics, and the development of statistical programming languages facilitated statisticians' work on scientific and engineering problems. Such problems included the fabrication of semiconductors and the understanding of communications networks, both of which were of interest to Bell Labs. These statistical developments, all championed by Tukey, were designed to complement the [[analytic function|analytic]] theory of [[statistical hypothesis testing|testing statistical hypotheses]], particularly the [[Pierre-Simon Laplace|Laplacian]] tradition's emphasis on [[exponential family|exponential families]].<ref>{{cite journal |title=Conversation with John W. Tukey and Elizabeth Tukey, Luisa T. Fernholz and Stephan Morgenthaler |journal=Statistical Science |volume=15 |issue=1 |year=2000 |pages=79–94 |doi=10.1214/ss/1009212675|last1=Morgenthaler |first1=Stephan |last2=Fernholz |first2=Luisa T. |doi-access=free }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)