Editing Statistical inference (section)

===Randomization-based models===
{{Main|Randomization}}
{{See also|Random sample|Random assignment}}
For a given dataset that was produced by a randomization design, the randomization distribution of a statistic (under the null-hypothesis) is defined by evaluating the test statistic for all of the plans that could have been generated by the randomization design. In frequentist inference, the randomization allows inferences to be based on the randomization distribution rather than a subjective model, and this is important especially in survey sampling and design of experiments.<ref>[[Jerzy Neyman|Neyman, J.]](1934) "On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection", ''[[Journal of the Royal Statistical Society]]'', 97 (4), 557–625 {{JSTOR|2342192}}</ref><ref name="Hinkelmann and Kempthorne2">Hinkelmann and Kempthorne(2008) {{page needed|date=June 2011}}</ref> Statistical inference from randomized studies is also more straightforward than many other situations.<ref>ASA Guidelines for the first course in statistics for non-statisticians. (available at the ASA website)</ref><ref>[[David A. Freedman]] et alia's ''Statistics''.</ref><ref>Moore et al. (2015).</ref> In [[Bayesian inference]], randomization is also of importance: in [[survey sampling]], use of [[sampling without replacement]] ensures the [[exchangeability]] of the sample with the population; in randomized experiments, randomization warrants a [[missing at random]] assumption for [[covariate]] information.<ref>[[Andrew Gelman|Gelman A.]] et al. (2013). ''Bayesian Data Analysis'' ([[Chapman & Hall]]).</ref>

Objective randomization allows properly inductive procedures.<ref>Peirce (1877-1878)</ref><ref>Peirce (1883)</ref>{{sfn|Freedman|Pisani|Purves|1978}}<ref>[[David A. Freedman]] ''Statistical Models''.</ref><ref>[[C. R. Rao|Rao, C.R.]] (1997) ''Statistics and Truth: Putting Chance to Work'', World Scientific. {{isbn|981-02-3111-3}}</ref> Many statisticians prefer randomization-based analysis of data that was generated by well-defined randomization procedures.<ref>Peirce; Freedman; Moore et al. (2015).{{Citation needed|date=March 2010}}</ref> (However, it is true that in fields of science with developed theoretical knowledge and experimental control, randomized experiments may increase the costs of experimentation without improving the quality of inferences.<ref>Box, G.E.P. and Friends (2006) ''Improving Almost Anything: Ideas and Essays, Revised Edition'', Wiley. {{isbn|978-0-471-72755-2}}</ref><ref>Cox (2006), p.&nbsp;196.</ref>) Similarly, results from [[randomized experiment]]s are recommended by leading statistical authorities as allowing inferences with greater reliability than do observational studies of the same phenomena.<ref>ASA Guidelines for the first course in statistics for non-statisticians. (available at the ASA website)

* David A. Freedman et alias ''Statistics''.
* Moore et al. (2015).</ref> However, a good observational study may be better than a bad randomized experiment.

The statistical analysis of a randomized experiment may be based on the randomization scheme stated in the experimental protocol and does not need a subjective model.<ref>Neyman, Jerzy. 1923 [1990]. "On the Application of Probability Theory to AgriculturalExperiments. Essay on Principles. Section 9." ''Statistical Science'' 5 (4): 465–472. Trans. [[Dorota Dabrowska|Dorota M. Dabrowska]] and Terence P. Speed.</ref><ref>Hinkelmann & Kempthorne (2008) {{page needed|date=June 2011}}</ref>

However, at any time, some hypotheses cannot be tested using objective statistical models, which accurately describe randomized experiments or random samples. In some cases, such randomized studies are uneconomical or unethical.

==== Model-based analysis of randomized experiments ====
It is standard practice to refer to a statistical model, e.g., a linear or logistic models, when analyzing data from randomized experiments.<ref name="Dinov Palanimalai Khare Christou 20182">{{cite journal |last1=Dinov |first1=Ivo |last2=Palanimalai |first2=Selvam |last3=Khare |first3=Ashwini |last4=Christou |first4=Nicolas |date=2018 |title=Randomization-based statistical inference: A resampling and simulation infrastructure |journal=Teaching Statistics |volume=40 |issue=2 |pages=64–73 |doi=10.1111/test.12156 |pmc=6155997 |pmid=30270947}}</ref> However, the randomization scheme guides the choice of a statistical model. It is not possible to choose an appropriate model without knowing the randomization scheme.<ref name="Hinkelmann and Kempthorne2" /> Seriously misleading results can be obtained analyzing data from randomized experiments while ignoring the experimental protocol; common mistakes include forgetting the blocking used in an experiment and confusing repeated measurements on the same experimental unit with independent replicates of the treatment applied to different experimental units.<ref>Hinkelmann and Kempthorne (2008) Chapter 6.</ref>

==== Model-free randomization inference ====
Model-free techniques provide a complement to model-based methods, which employ reductionist strategies of reality-simplification. The former combine, evolve, ensemble and train algorithms dynamically adapting to the contextual affinities of a process and learning the intrinsic characteristics of the observations.<ref name="Dinov Palanimalai Khare Christou 2018">
{{cite journal
|last1= Dinov | first1=Ivo
|last2= Palanimalai | first2= Selvam
|last3= Khare |first3= Ashwini
|last4= Christou |first4= Nicolas
|date= 2018
|title= Randomization-based statistical inference: A resampling and simulation infrastructure
|journal= Teaching Statistics
|volume= 40
|issue= 2
|pages= 64–73
|doi= 10.1111/test.12156
| pmid=30270947
|pmc= 6155997
}}</ref><ref name="Tang model-based Model-Free 2019">
{{cite journal
|last1= Tang | first1=Ming
|last2= Gao | first2=Chao
|last3= Goutman | first3=Stephen
|last4= Kalinin | first4=Alexandr
|last5= Mukherjee | first5=Bhramar
|last6= Guan | first6=Yuanfang
|last7= Dinov | first7=Ivo
|date= 2019
|title= Model-Based and Model-Free Techniques for Amyotrophic Lateral Sclerosis Diagnostic Prediction and Patient Clustering
|journal= Neuroinformatics
|volume= 17
| issue=3
|pages= 407–421
|doi= 10.1007/s12021-018-9406-9
| pmid=30460455 | pmc= 6527505
}}</ref>

For example, model-free simple linear regression is based either on:

* a ''random design'', where the pairs of observations <math>(X_1,Y_1), (X_2,Y_2), \cdots , (X_n,Y_n)</math> are independent and identically distributed (iid),
* or a ''deterministic design'', where the variables <math>X_1, X_2, \cdots, X_n</math> are deterministic, but the corresponding response variables <math>Y_1,Y_2, \cdots, Y_n</math> are random and independent with a common conditional distribution, i.e., <math>P\left (Y_j \leq y | X_j =x\right ) = D_x(y)</math>, which is independent of the index <math>j</math>.

In either case, the model-free randomization inference for features of the common conditional distribution <math>D_x(.)</math> relies on some regularity conditions, e.g. functional smoothness. For instance, model-free randomization inference for the population feature ''conditional mean'', <math>\mu(x)=E(Y | X = x)</math>, can be consistently estimated via local averaging or local polynomial fitting, under the assumption that <math>\mu(x)</math> is smooth. Also, relying on asymptotic normality or resampling, we can construct confidence intervals for the population feature, in this case, the ''conditional mean'', <math>\mu(x)</math>.<ref name="Politis Model-Free Inference 2019">
{{cite journal
|last1= Politis | first1=D.N.
|date= 2019
|title= Model-free inference in statistics: how and why
|journal= IMS Bulletin
|volume= 48
|url= http://bulletin.imstat.org/2015/11/model-free-inference-in-statistics-how-and-why/
}}</ref>