Editing Statistical inference (section)

== Models and assumptions ==
{{Main|Statistical model|Statistical assumptions}}
Any statistical inference requires some assumptions. A '''statistical model''' is a set of assumptions concerning the generation of the observed data and similar data. Descriptions of statistical models usually emphasize the role of population quantities of interest, about which we wish to draw inference.<ref name="Cox20062">Cox (2006) page 2</ref> [[Descriptive statistic]]s are typically used as a preliminary step before more formal inferences are drawn.<ref>{{cite book |last=Evans |first=Michael |url=https://books.google.com/books?id=hkWK8kFzXWIC |title=Probability and Statistics: The Science of Uncertainty |publisher=Freeman and Company |year=2004 |isbn=9780716747420 |page=267 |display-authors=etal}}</ref>

=== Degree of models/assumptions ===
Statisticians distinguish between three levels of modeling assumptions:

* '''[[Parametric model|Fully parametric]]''': The probability distributions describing the data-generation process are assumed to be fully described by a family of probability distributions involving only a finite number of unknown parameters.<ref name="Cox20062" /> For example, one may assume that the distribution of population values is truly Normal, with unknown mean and variance, and that datasets are generated by [[Simple random sample|'simple' random sampling]]. The family of [[Generalized linear model#Model components|generalized linear models]] is a widely used and flexible class of parametric models.
* '''[[Nonparametric statistics#Non-parametric models|Non-parametric]]''': The assumptions made about the process generating the data are much less than in parametric statistics and may be minimal.<ref>van der Vaart, A.W. (1998) ''Asymptotic Statistics''  Cambridge University Press. {{isbn|0-521-78450-6}} (page 341)</ref> For example, every continuous probability distribution has a median, which may be estimated using the sample median or the [[Hodges–Lehmann estimator|Hodges–Lehmann–Sen estimator]], which has good properties when the data arise from simple random sampling.
* '''[[Semiparametric model|Semi-parametric]]''': This term typically implies assumptions 'in between' fully and non-parametric approaches. For example, one may assume that a population distribution has a finite mean. Furthermore, one may assume that the mean response level in the population depends in a truly linear manner on some covariate (a parametric assumption) but not make any parametric assumption describing the variance around that mean (i.e. about the presence or possible form of any [[heteroscedasticity]]). More generally, semi-parametric models can often be separated into 'structural' and 'random variation' components. One component is treated parametrically and the other non-parametrically. The well-known [[Cox model]] is a set of semi-parametric assumptions.{{citation needed|date=November 2023}}

=== Importance of valid models/assumptions ===
{{See also|Statistical model validation}}
[[File:Normality_Histogram.png|thumb|The above image shows a histogram assessing the assumption of normality, which can be illustrated through the even spread underneath the bell curve.]]
Whatever level of assumption is made, correctly calibrated inference, in general, requires these assumptions to be correct; i.e. that the data-generating mechanisms really have been correctly specified.

Incorrect assumptions of [[Simple random sample|'simple' random sampling]] can invalidate statistical inference.<ref>Kruskal 1988</ref> More complex semi- and fully parametric assumptions are also cause for concern. For example, incorrectly assuming the Cox model can in some cases lead to faulty conclusions.<ref>[[David A. Freedman|Freedman, D.A.]] (2008) "Survival analysis: An Epidemiological hazard?". ''The American Statistician'' (2008) 62: 110-119. (Reprinted as Chapter 11 (pages 169–192) of Freedman (2010)).</ref> Incorrect assumptions of Normality in the population also invalidates some forms of regression-based inference.<ref>Berk, R. (2003) ''Regression Analysis: A Constructive Critique (Advanced Quantitative Techniques in the Social Sciences) (v. 11)'' Sage Publications. {{ISBN|0-7619-2904-5}}</ref> The use of '''any''' parametric model is viewed skeptically by most experts in sampling human populations: "most sampling statisticians, when they deal with confidence intervals at all, limit themselves to statements about [estimators] based on very large samples, where the central limit theorem ensures that these [estimators] will have distributions that are nearly normal."<ref name="Brewer2">{{cite book |last=Brewer |first=Ken |title=Combined Survey Sampling Inference: Weighing of Basu's Elephants |publisher=Hodder Arnold |year=2002 |isbn=978-0340692295 |page=6}}</ref> In particular, a normal distribution "would be a totally unrealistic and catastrophically unwise assumption to make if we were dealing with any kind of economic population."<ref name="Brewer2" /> Here, the central limit theorem states that the distribution of the sample mean "for very large samples" is approximately normally distributed, if the distribution is not heavy-tailed.

====Approximate distributions====
{{Main|Statistical distance|Asymptotic theory (statistics)|Approximation theory}}
Given the difficulty in specifying exact distributions of sample statistics, many methods have been developed for approximating these.

With finite samples, [[Approximation theory|approximation results]] measure how close a limiting distribution approaches the statistic's [[sample distribution]]: For example, with 10,000 independent samples the [[normal distribution]] approximates (to two digits of accuracy) the distribution of the [[sample mean]] for many population distributions, by the [[Berry–Esseen theorem]].<ref name="JHJ2">Jörgen Hoffman-Jörgensen's ''Probability With a View Towards Statistics'', Volume I. Page 399 {{full citation needed|date=November 2012}}</ref> Yet for many practical purposes, the normal approximation provides a good approximation to the sample-mean's distribution when there are 10 (or more) independent samples, according to simulation studies and statisticians' experience.<ref name="JHJ2" /> Following Kolmogorov's work in the 1950s, advanced statistics uses [[approximation theory]] and [[functional analysis]] to quantify the error of approximation. In this approach, the [[metric geometry]] of [[probability distribution]]s is studied; this approach quantifies approximation error with, for example, the [[Kullback–Leibler divergence]], [[Bregman divergence]], and the [[Hellinger distance]].<ref>Le Cam (1986) {{page needed|date=June 2011}}</ref><ref>Erik Torgerson (1991) ''Comparison of Statistical Experiments'', volume 36 of Encyclopedia of Mathematics. Cambridge University Press. {{full citation needed|date=November 2012}}</ref><ref>{{cite book |author1=Liese, Friedrich |title=Statistical Decision Theory: Estimation, Testing, and Selection |author2=Miescke, Klaus-J. |publisher=Springer |year=2008 |isbn=978-0-387-73193-3 |name-list-style=amp}}</ref>

With indefinitely large samples, [[Asymptotic theory (statistics)|limiting results]] like the [[central limit theorem]] describe the sample statistic's limiting distribution if one exists. Limiting results are not statements about finite samples, and indeed are irrelevant to finite samples.<ref>Kolmogorov (1963, p.369): "The frequency concept, <!-- comma missing in original --> based on the notion of limiting frequency as the number of trials increases to infinity, does not contribute anything to substantiate the applicability of the results of probability theory to real practical problems where we have always to deal with a finite number of trials".</ref><ref>"Indeed, limit theorems 'as&nbsp;<math>n</math> tends to infinity' are logically devoid of content about what happens at any particular&nbsp;<math>n</math>. All they can do is suggest certain approaches whose performance must then be checked on the case at hand." — Le Cam (1986) (page xiv)</ref><ref>Pfanzagl (1994): "The crucial drawback of asymptotic theory: What we expect from asymptotic theory are results which hold approximately . . . . What asymptotic theory has to offer are limit theorems."(page ix) "What counts for applications are approximations, not limits." (page 188)</ref> However, the asymptotic theory of limiting distributions is often invoked for work with finite samples. For example, limiting results are often invoked to justify the [[generalized method of moments]] and the use of [[generalized estimating equation]]s, which are popular in [[econometrics]] and [[biostatistics]]. The magnitude of the difference between the limiting distribution and the true distribution (formally, the 'error' of the approximation) can be assessed using simulation<!-- and approximation results -->.<ref>Pfanzagl (1994) : "By taking a limit theorem as being approximately true for large sample sizes, we commit an error the size of which is unknown. [. . .] Realistic information about the remaining errors may be obtained by simulations." (page ix)</ref> The heuristic application of limiting results to finite samples is common practice in many applications, especially with low-dimensional [[Statistical model|models]] with [[Logarithmically concave function|log-concave]] [[Likelihood function|likelihoods]] (such as with one-parameter [[exponential families]]).

===Randomization-based models===
{{Main|Randomization}}
{{See also|Random sample|Random assignment}}
For a given dataset that was produced by a randomization design, the randomization distribution of a statistic (under the null-hypothesis) is defined by evaluating the test statistic for all of the plans that could have been generated by the randomization design. In frequentist inference, the randomization allows inferences to be based on the randomization distribution rather than a subjective model, and this is important especially in survey sampling and design of experiments.<ref>[[Jerzy Neyman|Neyman, J.]](1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", ''[[Journal of the Royal Statistical Society]]'', 97 (4), 557–625 {{JSTOR|2342192}}</ref><ref name="Hinkelmann and Kempthorne2">Hinkelmann and Kempthorne(2008) {{page needed|date=June 2011}}</ref> Statistical inference from randomized studies is also more straightforward than many other situations.<ref>ASA Guidelines for the first course in statistics for non-statisticians. (available at the ASA website)</ref><ref>[[David A. Freedman]] et alia's ''Statistics''.</ref><ref>Moore et al. (2015).</ref> In [[Bayesian inference]], randomization is also of importance: in [[survey sampling]], use of [[sampling without replacement]] ensures the [[exchangeability]] of the sample with the population; in randomized experiments, randomization warrants a [[missing at random]] assumption for [[covariate]] information.<ref>[[Andrew Gelman|Gelman A.]] et al. (2013). ''Bayesian Data Analysis'' ([[Chapman & Hall]]).</ref>

Objective randomization allows properly inductive procedures.<ref>Peirce (1877-1878)</ref><ref>Peirce (1883)</ref>{{sfn|Freedman|Pisani|Purves|1978}}<ref>[[David A. Freedman]] ''Statistical Models''.</ref><ref>[[C. R. Rao|Rao, C.R.]] (1997) ''Statistics and Truth: Putting Chance to Work'', World Scientific. {{isbn|981-02-3111-3}}</ref> Many statisticians prefer randomization-based analysis of data that was generated by well-defined randomization procedures.<ref>Peirce; Freedman; Moore et al. (2015).{{Citation needed|date=March 2010}}</ref> (However, it is true that in fields of science with developed theoretical knowledge and experimental control, randomized experiments may increase the costs of experimentation without improving the quality of inferences.<ref>Box, G.E.P. and Friends (2006) ''Improving Almost Anything: Ideas and Essays, Revised Edition'', Wiley. {{isbn|978-0-471-72755-2}}</ref><ref>Cox (2006), p.&nbsp;196.</ref>) Similarly, results from [[randomized experiment]]s are recommended by leading statistical authorities as allowing inferences with greater reliability than do observational studies of the same phenomena.<ref>ASA Guidelines for the first course in statistics for non-statisticians. (available at the ASA website)

* David A. Freedman et alias ''Statistics''.
* Moore et al. (2015).</ref> However, a good observational study may be better than a bad randomized experiment.

The statistical analysis of a randomized experiment may be based on the randomization scheme stated in the experimental protocol and does not need a subjective model.<ref>Neyman, Jerzy. 1923 [1990]. "On the Application of Probability Theory to AgriculturalExperiments. Essay on Principles. Section 9." ''Statistical Science'' 5 (4): 465–472. Trans. [[Dorota Dabrowska|Dorota M. Dabrowska]] and Terence P. Speed.</ref><ref>Hinkelmann & Kempthorne (2008) {{page needed|date=June 2011}}</ref>

However, at any time, some hypotheses cannot be tested using objective statistical models, which accurately describe randomized experiments or random samples. In some cases, such randomized studies are uneconomical or unethical.

==== Model-based analysis of randomized experiments ====
It is standard practice to refer to a statistical model, e.g., a linear or logistic models, when analyzing data from randomized experiments.<ref name="Dinov Palanimalai Khare Christou 20182">{{cite journal |last1=Dinov |first1=Ivo |last2=Palanimalai |first2=Selvam |last3=Khare |first3=Ashwini |last4=Christou |first4=Nicolas |date=2018 |title=Randomization-based statistical inference: A resampling and simulation infrastructure |journal=Teaching Statistics |volume=40 |issue=2 |pages=64–73 |doi=10.1111/test.12156 |pmc=6155997 |pmid=30270947}}</ref> However, the randomization scheme guides the choice of a statistical model. It is not possible to choose an appropriate model without knowing the randomization scheme.<ref name="Hinkelmann and Kempthorne2" /> Seriously misleading results can be obtained analyzing data from randomized experiments while ignoring the experimental protocol; common mistakes include forgetting the blocking used in an experiment and confusing repeated measurements on the same experimental unit with independent replicates of the treatment applied to different experimental units.<ref>Hinkelmann and Kempthorne (2008) Chapter 6.</ref>

==== Model-free randomization inference ====
Model-free techniques provide a complement to model-based methods, which employ reductionist strategies of reality-simplification. The former combine, evolve, ensemble and train algorithms dynamically adapting to the contextual affinities of a process and learning the intrinsic characteristics of the observations.<ref name="Dinov Palanimalai Khare Christou 2018">
{{cite journal
|last1= Dinov | first1=Ivo
|last2= Palanimalai | first2= Selvam
|last3= Khare |first3= Ashwini
|last4= Christou |first4= Nicolas
|date= 2018
|title= Randomization-based statistical inference: A resampling and simulation infrastructure
|journal= Teaching Statistics
|volume= 40
|issue= 2
|pages= 64–73
|doi= 10.1111/test.12156
| pmid=30270947
|pmc= 6155997
}}</ref><ref name="Tang model-based Model-Free 2019">
{{cite journal
|last1= Tang | first1=Ming
|last2= Gao | first2=Chao
|last3= Goutman | first3=Stephen
|last4= Kalinin | first4=Alexandr
|last5= Mukherjee | first5=Bhramar
|last6= Guan | first6=Yuanfang
|last7= Dinov | first7=Ivo
|date= 2019
|title= Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering
|journal= Neuroinformatics
|volume= 17
| issue=3
|pages= 407–421
|doi= 10.1007/s12021-018-9406-9
| pmid=30460455 | pmc= 6527505
}}</ref>

For example, model-free simple linear regression is based either on:

* a ''random design'', where the pairs of observations <math>(X_1,Y_1), (X_2,Y_2), \cdots , (X_n,Y_n)</math> are independent and identically distributed (iid),
* or a ''deterministic design'', where the variables <math>X_1, X_2, \cdots, X_n</math> are deterministic, but the corresponding response variables <math>Y_1,Y_2, \cdots, Y_n</math> are random and independent with a common conditional distribution, i.e., <math>P\left (Y_j \leq y | X_j =x\right ) = D_x(y)</math>, which is independent of the index <math>j</math>.

In either case, the model-free randomization inference for features of the common conditional distribution <math>D_x(.)</math> relies on some regularity conditions, e.g. functional smoothness. For instance, model-free randomization inference for the population feature ''conditional mean'', <math>\mu(x)=E(Y | X = x)</math>, can be consistently estimated via local averaging or local polynomial fitting, under the assumption that <math>\mu(x)</math> is smooth. Also, relying on asymptotic normality or resampling, we can construct confidence intervals for the population feature, in this case, the ''conditional mean'', <math>\mu(x)</math>.<ref name="Politis Model-Free Inference 2019">
{{cite journal
|last1= Politis | first1=D.N.
|date= 2019
|title= Model-free inference in statistics: how and why
|journal= IMS Bulletin
|volume= 48
|url= http://bulletin.imstat.org/2015/11/model-free-inference-in-statistics-how-and-why/
}}</ref>