Editing Biostatistics (section)

== Applications ==
{{Prose|section|date=March 2016}}

=== Public health ===
[[Public health]], including [[epidemiology]], [[health services research]], [[nutrition]], [[environmental health]] and health care policy & management. In these [[medicine]] contents, it's important to consider the design and analysis of the [[clinical trial]]s. As one example, there is the assessment of severity state of a patient with a prognosis of an outcome of a disease.

With new technologies and genetics knowledge, biostatistics are now also used for [[Systems medicine]], which consists in a more personalized medicine. For this, is made an integration of data from different sources, including conventional patient data, clinico-pathological parameters, molecular and genetic data as well as data generated by additional new-omics technologies.<ref>{{cite journal|doi=10.1038/emm.2017.290|pmid=29497170|pmc=5898894|title=Whither systems medicine?|journal=Experimental & Molecular Medicine|volume=50|issue=3|pages=e453|year=2018|last1=Apweiler|first1=Rolf|display-authors=et al}}</ref>

=== Quantitative genetics ===

The study of [[population genetics]] and [[statistical genetics]] in order to link variation in [[genotype]] with a variation in [[phenotype]]. In other words, it is desirable to discover the genetic basis of a measurable trait, a quantitative trait, that is under polygenic control. A genome region that is responsible for a continuous trait is called a [[quantitative trait locus]] (QTL). The study of QTLs become feasible by using [[molecular marker]]s and measuring traits in populations, but their mapping needs the obtaining of a population from an experimental crossing, like an F2 or [[recombinant inbred strain]]s/lines (RILs). To scan for QTLs regions in a genome, a [[gene map]] based on linkage have to be built. Some of the best-known QTL mapping algorithms are Interval Mapping, Composite Interval Mapping, and Multiple Interval Mapping.<ref>{{cite journal|doi=10.1007/s10709-004-2705-0|pmid=15881678|title=QTL mapping and the genetic basis of adaptation: Recent developments|journal=Genetica|volume=123|issue=1–2|pages=25–37|year=2005|last1=Zeng|first1=Zhao-Bang|s2cid=1094152}}</ref>

However, QTL mapping resolution is impaired by the amount of recombination assayed, a problem for species in which it is difficult to obtain large offspring. Furthermore, allele diversity is restricted to individuals originated from contrasting parents, which limit studies of allele diversity when we have a panel of individuals representing a natural population.<ref>{{cite journal|doi=10.1186/1746-4811-9-29|pmid=23876160|pmc=3750305|title=The advantages and limitations of trait analysis with GWAS: A review|journal=Plant Methods|volume=9|pages=29|year=2013|last1=Korte|first1=Arthur|last2=Farlow|first2=Ashley |issue=1 |doi-access=free |bibcode=2013PlMet...9...29K }}</ref> For this reason, the [[genome-wide association study]] was proposed in order to identify QTLs based on [[linkage disequilibrium]], that is the non-random association between traits and molecular markers. It was leveraged by the development of high-throughput [[SNP genotyping]].<ref>{{cite journal|doi=10.3835/plantgenome2008.02.0089|title=Status and Prospects of Association Mapping in Plants|journal= The Plant Genome|volume=1|pages=5–20|year=2008|last1=Zhu|first1=Chengsong|last2=Gore|first2=Michael|last3=Buckler|first3=Edward S|last4=Yu|first4=Jianming|doi-access=free}}</ref>

In [[Animal breeding|animal]] and [[plant breeding]], the use of markers in [[Selective breeding|selection]] aiming for breeding, mainly the molecular ones, collaborated to the development of [[marker-assisted selection]]. While QTL mapping is limited due resolution, GWAS does not have enough power when rare variants of small effect that are also influenced by environment. So, the concept of Genomic Selection (GS) arises in order to use all molecular markers in the selection and allow the prediction of the performance of candidates in this selection. The proposal is to genotype and phenotype a training population, develop a model that can obtain the genomic estimated breeding values (GEBVs) of individuals belonging to a genotype and but not phenotype population, called testing population.<ref>{{cite journal|doi=10.1016/j.tplants.2017.08.011|pmid=28965742|title=Genomic Selection in Plant Breeding: Methods, Models, and Perspectives|journal=Trends in Plant Science|volume=22|issue=11|pages=961–975|year=2017|last1=Crossa|first1=José|last2=Pérez-Rodríguez|first2=Paulino|last3=Cuevas|first3=Jaime|last4=Montesinos-López|first4=Osval|last5=Jarquín|first5=Diego|last6=De Los Campos|first6=Gustavo|last7=Burgueño|first7=Juan|last8=González-Camacho|first8=Juan M|last9=Pérez-Elizalde|first9=Sergio|last10=Beyene|first10=Yoseph|last11=Dreisigacker|first11=Susanne|last12=Singh|first12=Ravi|last13=Zhang|first13=Xuecai|last14=Gowda|first14=Manje|last15=Roorkiwal|first15=Manish|last16=Rutkoski|first16=Jessica|last17=Varshney|first17=Rajeev K|bibcode=2017TPS....22..961C |url=http://oar.icrisat.org/10280/1/Genomic%20Selection%20in%20Plant%20Breeding%20Methods%2C%20Models%2C%20and%20Perspectives.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://oar.icrisat.org/10280/1/Genomic%20Selection%20in%20Plant%20Breeding%20Methods%2C%20Models%2C%20and%20Perspectives.pdf |archive-date=2022-10-09 |url-status=live}}</ref> This kind of study could also include a validation population, thinking in the concept of [[cross-validation (statistics)|cross-validation]], in which the real phenotype results measured in this population are compared with the phenotype results based on the prediction, what used to check the accuracy of the model.

As a summary, some points about the application of quantitative genetics are:
* This has been used in agriculture to improve crops ([[Plant breeding]]) and [[livestock]] ([[Animal breeding]]).
* In biomedical research, this work can assist in finding candidates [[gene]] [[allele]]s that can cause or influence predisposition to diseases in [[human genetics]]

=== Expression data ===

Studies for differential expression of genes from [[RNA-Seq]] data, as for [[Real-time polymerase chain reaction|RT-qPCR]] and [[microarrays]], demands comparison of conditions. The goal is to identify genes which have a significant change in abundance between different conditions. Then, experiments are designed appropriately, with replicates for each condition/treatment, randomization and blocking, when necessary. In RNA-Seq, the quantification of expression uses the information of mapped reads that are summarized in some genetic unit, as [[exon]]s that are part of a gene sequence. As [[microarray]] results can be approximated by a normal distribution, RNA-Seq counts data are better explained by other distributions. The first used distribution was the [[Poisson distribution|Poisson]] one, but it underestimate the sample error, leading to false positives. Currently, biological variation is considered by methods that estimate a dispersion parameter of a [[negative binomial distribution]]. [[Generalized linear model]]s are used to perform the tests for statistical significance and as the number of genes is high, multiple tests correction have to be considered.<ref>{{cite journal| doi =10.1186/gb-2010-11-12-220| pmid =21176179| pmc =3046478| title =From RNA-seq reads to differential expression results| journal =Genome Biology| volume =11| issue =12| pages =220| year =2010| last1 =Oshlack| first1 =Alicia| last2 =Robinson| first2 =Mark D| last3 =Young| first3 =Matthew D| doi-access =free}}</ref> Some examples of other analysis on [[genomics]] data comes from microarray or [[proteomics]] experiments.<ref>{{cite book|title=Statistical Analysis of Gene Expression Microarray Data|author1=Helen Causton |author2=John Quackenbush |author3=Alvis Brazma |publisher=Wiley-Blackwell|year=2003}}</ref><ref>{{cite book|title=Microarray Gene Expression Data Analysis: A Beginner's Guide|author=Terry Speed|publisher=Chapman & Hall/CRC|year=2003}}</ref> Often concerning diseases or disease stages.<ref>{{cite book|title=Medical Biostatistics for Complex Diseases|author1=Frank Emmert-Streib |author2=Matthias Dehmer |publisher=Wiley-Blackwell|year=2010|isbn= 978-3-527-32585-6}}</ref>

=== Other studies ===
* [[Ecology]], [[ecological forecasting]]
* Biological [[sequence analysis]]<ref>{{cite book|title=Statistical Methods in Bioinformatics: An Introduction|author1=Warren J. Ewens |author2=Gregory R. Grant |publisher=Springer|year=2004}}</ref>
* [[Systems biology]] for gene network inference or pathways analysis.<ref>{{cite book|title=Applied Statistics for Network Biology: Methods in Systems Biology|author1=Matthias Dehmer |author2=Frank Emmert-Streib |author3=Armin Graber |author4=Armindo Salvador |publisher=Wiley-Blackwell|year=2011|isbn= 978-3-527-32750-8}}</ref>
* [[Clinical research]] and pharmaceutical development
* [[Population dynamics]], especially in regards to [[fisheries science]].
* [[Phylogenetics]] and [[evolution]]
* [[Pharmacodynamics]]
* [[Pharmacokinetics]]
* [[Neuroimaging]]