Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Biostatistics
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Descriptive tools === {{Main| Descriptive statistics}} Data can be represented through [[Table (information)|tables]] or [[chart|graphical]] representation, such as line charts, bar charts, histograms, scatter plot. Also, [[Central tendency|measures of central]] tendency and [[Statistical dispersion|variability]] can be very useful to describe an overview of the data. Follow some examples: ==== Frequency tables ==== One type of table is the [[frequency]] table, which consists of data arranged in rows and columns, where the frequency is the number of occurrences or repetitions of data. Frequency can be:<ref>{{Cite web|url=https://www.sangakoo.com/en/unit/absolute-relative-cumulative-frequency-and-statistical-tables|title=Absolute, relative, cumulative frequency and statistical tables – Probability and Statistics|last=Maths|first=Sangaku|website=www.sangakoo.com|language=en|access-date=2018-04-10}}</ref> '''Absolute''': represents the number of times that a determined value appear; <math display="block">N = f_1 + f_2 + f_3 + ... + f_n</math> '''Relative''': obtained by the division of the absolute frequency by the total number; <math display="block">n_i = \frac{f_i}{N}</math> In the next example, we have the number of genes in ten [[operon]]s of the same organism. : {{math|1=Genes = {{mset|2,3,3,4,5,3,3,3,3,4}}}} {| class="wikitable" |+ !Genes number !Absolute frequency !Relative frequency |- |1 |0 |0 |- |2 |1 |0.1 |- |3 |6 |0.6 |- |4 |2 |0.2 |- |5 |1 |0.1 |} ==== Line graph ==== [[File:Examples of descriptive tools.png|thumb| Figure A: '''Line graph example'''. The birth rate in Brazil (2010–2016);<ref name=":1">{{Cite web|url=http://tabnet.datasus.gov.br/cgi/deftohtm.exe?sinasc/cnv/nvuf.def|title=DATASUS: TabNet Win32 3.0: Nascidos vivos – Brasil|website=DATASUS: Tecnologia da Informação a Serviço do SUS}}</ref> Figure B: '''Bar chart example.''' The birth rate in [[Brazil]] for the December months from 2010 to 2016; Figure C: '''Example of Box Plot''': number of glycines in the proteome of eight different organisms (A-H); Figure D: '''Example of a scatter plot.''']] [[Line graph]]s represent the variation of a value over another metric, such as time. In general, values are represented in the vertical axis, while the time variation is represented in the horizontal axis.<ref name=":0">{{Cite book|title=Introduction to Biostatistics. A Guide to Design, Analysis, and Discovery|last1=Forthofer|first1=Ronald N.|last2=Lee|first2=Eun Sul|publisher=Academic Press|year=1995|isbn=978-0-12-262270-0}}</ref> ==== Bar chart ==== A [[bar chart]] is a graph that shows categorical data as bars presenting heights (vertical bar) or widths (horizontal bar) proportional to represent values. Bar charts provide an image that could also be represented in a tabular format.<ref name=":0" /> In the bar chart example, we have the birth rate in Brazil for the December months from 2010 to 2016.<ref name=":1" /> The sharp fall in December 2016 reflects the outbreak of [[Zika virus]] in the birth rate in Brazil. ==== Histograms ==== [[File:Example histogram.png|thumb|'''Example of a histogram.'''|350x350px]]The [[histogram]] (or frequency distribution) is a graphical representation of a dataset tabulated and divided into uniform or non-uniform classes. It was first introduced by [[Karl Pearson]].<ref>{{Cite journal|last=Pearson|first=Karl|date=1895-01-01|title=X. Contributions to the mathematical theory of evolution.—II. Skew variation in homogeneous material|journal=Phil. Trans. R. Soc. Lond. A|language=en|volume=186|pages=343–414|doi=10.1098/rsta.1895.0010|issn=0264-3820|bibcode=1895RSPTA.186..343P|doi-access=free}}</ref> ==== Scatter plot ==== A [[scatter plot]] is a mathematical diagram that uses Cartesian coordinates to display values of a dataset. A scatter plot shows the data as a set of points, each one presenting the value of one variable determining the position on the horizontal axis and another variable on the vertical axis.<ref>{{Cite book|title=Seeing through statistics|last=Utts|first=Jessica M.|date=2005|publisher=Thomson, Brooks/Cole|isbn=978-0534394028|edition= 3rd|location=Belmont, CA|oclc=56568530}}</ref> They are also called '''scatter graph''', '''scatter chart''', '''scattergram''', or '''scatter diagram'''.<ref>{{Cite book|title=Basic statistics|last=Jarrell|first=Stephen B.|date=1994|publisher=Wm. C. Brown Pub|isbn=978-0697215956|location=Dubuque, Iowa|oclc=30301196}}</ref> ==== Mean ==== {{Main| Mean}} The [[arithmetic mean]] is the sum of a collection of values (<math>{x_1+x_2+x_3+\cdots +x_n}</math>) divided by the number of items of this collection (<math>{n}</math>). : <math>\bar{x} = \frac{1}{n}\left (\sum_{i=1}^n{x_i}\right ) = \frac{x_1+x_2+\cdots +x_n}{n}</math> ==== Median ==== {{Main| Median}} The [[median]] is the value in the middle of a dataset. ==== Mode ==== {{Main| Mode (statistics)}} The [[mode (statistics)|mode]] is the value of a set of data that appears most often.<ref>{{Cite book|title=Econometrics|last=Gujarati|first=Damodar N.|publisher=McGraw-Hill Irwin|year=2006}}</ref> {| class="wikitable" |+ |Comparison among mean, median and mode<br /> Values = { 2,3,3,3,3,3,4,4,11 } !Type !Example !Result |- | align="center" |[[Arithmetic mean|Mean]] | align="center" | ( 2 + 3 + 3 + 3 + 3 + 3 + 4 + 4 + 11 ) / 9 | align="center" |'''4''' |- | align="center" |[[Median]] | align="center" |2, 3, 3, 3, '''3''', 3, 4, 4, 11 | align="center" |'''3''' |- | align="center" |Mode | align="center" |2, '''3, 3, 3, 3, 3''', 4, 4, 11 | align="center" |'''3''' |} ==== Box plot ==== [[Box plot]] is a method for graphically depicting groups of numerical data. The maximum and minimum values are represented by the lines, and the interquartile range (IQR) represent 25–75% of the data. [[Outlier]]s may be plotted as circles. ==== Correlation coefficients ==== Although correlations between two different kinds of data could be inferred by graphs, such as scatter plot, it is necessary validate this though numerical information. For this reason, [[correlation coefficient]]s are required. They provide a numerical value that reflects the strength of an association.<ref name=":0" /> ==== Pearson correlation coefficient ==== [[File:Correlation coefficient.png|right|thumb|Scatter diagram that demonstrates the Pearson correlation for different values of ''ρ.'']] [[Pearson correlation coefficient]] is a measure of association between two variables, X and Y. This coefficient, usually represented by ''ρ'' (rho) for the population and ''r'' for the sample, assumes values between −1 and 1, where ''ρ'' = 1 represents a perfect positive correlation, ''ρ'' = −1 represents a perfect negative correlation, and ''ρ'' = 0 is no linear correlation.<ref name=":0" />
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)