Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Null hypothesis
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Choice of the null hypothesis== The choice of the null hypothesis is associated with sparse and inconsistent advice. Fisher mentioned few constraints on the choice and stated that many null hypotheses should be considered and that many tests are possible for each. The variety of applications and the diversity of goals suggests that the choice can be complicated. In many applications the formulation of the test is traditional. A familiarity with the range of tests available may suggest a particular null hypothesis and test. Formulating the null hypothesis is not automated (though the calculations of significance testing usually are). [[David Cox (statistician)|David Cox]] said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".<ref>{{cite book |last=Cox |first=D. R. |year=2006 |title=Principles of Statistical Inference |publisher=Cambridge University Press |isbn=978-0-521-68567-2 |page=197 }}</ref> A statistical significance test is intended to test a hypothesis. If the hypothesis summarizes a set of data, there is no value in testing the hypothesis on that set of data. Example: If a study of last year's weather reports indicates that rain in a region falls primarily on weekends, it is only valid to test that null hypothesis on weather reports from any ''other'' year. [[Testing hypotheses suggested by the data]] is [[circular reasoning]] that proves nothing; It is a special limitation on the choice of the null hypothesis. A routine procedure is as follows: Start from the scientific hypothesis. Translate this to a statistical alternative hypothesis and proceed: "Because H<sub>a</sub> expresses the effect that we wish to find evidence for, we often begin with H<sub>a</sub> and then set up H<sub>0</sub> as the statement that the hoped-for effect is not present."<ref name=moore>{{cite book|last1=Moore|first1=David|last2=McCabe|first2=George|title=Introduction to the Practice of Statistics|url=https://archive.org/details/isbn_9780716749127|url-access=registration|publisher=W.H. Freeman and Co|location=New York|year=2003|page=438|edition=4|isbn=978-0716796572}}</ref> This advice is ''reversed'' for modeling applications where we hope not to find evidence against the null. A complex case example is as follows:<ref name=jones96>{{cite journal | last = Jones | first = B |author2=P Jarvis |author3=J A Lewis |author4=A F Ebbutt | title = Trials to assess equivalence: the importance of rigorous methods | journal = BMJ | volume = 313 | issue = 7048 | pages = 36β39 | date = 6 July 1996 | doi=10.1136/bmj.313.7048.36| pmid = 8664772 | pmc = 2351444 }} It is suggested that the default position (the null hypothesis) should be that the treatments are ''not'' equivalent. Conclusions should be made on the basis of [[confidence intervals]] rather than significance.</ref> The gold standard in clinical research is the [[Randomized controlled trial|randomized]] [[Placebo-controlled study|placebo-controlled]] [[Blinded experiment#Double-blind trials|double-blind]] clinical trial. But testing a new drug against a (medically ineffective) placebo may be unethical for a serious illness. Testing a new drug against an older medically effective drug raises fundamental philosophical issues regarding the goal of the test and the motivation of the experimenters. The standard "no difference" null hypothesis may reward the pharmaceutical company for gathering inadequate data. "Difference" is a better null hypothesis in this case, but statistical significance is not an adequate criterion for reaching a nuanced conclusion which requires a good numeric estimate of the drug's effectiveness. A "minor" or "simple" proposed change in the null hypothesis ((new vs old) rather than (new vs placebo)) can have a dramatic effect on the utility of a test for complex non-statistical reasons. ===Directionality=== {{main|One- and two-tailed tests}} The choice of null hypothesis (''H''<sub>0</sub>) and consideration of directionality (see "[[one-tailed test]]") is critical. ====Tailedness of the null-hypothesis test==== Consider the question of whether a tossed coin is fair (i.e. that on average it lands heads up 50% of the time) and an experiment where you toss the coin 5 times. A possible result of the experiment that we consider here is 5 heads. Let outcomes be considered unlikely with respect to an assumed distribution if their probability is lower than a significance threshold of 0.05. A potential null hypothesis implying a one-tailed test is "this coin is not biased toward heads". Beware that, in this context, the term "one-tailed" does ''not'' refer to the outcome of a single coin toss (i.e., whether or not the coin comes up "tails" instead of "heads"); the term "[[One- and two-tailed tests|one-tailed]]" refers to a specific way of testing the null hypothesis in which the critical region (also known as "[[Statistical hypothesis testing#Definition of terms|region of rejection]]") ends up in on only one side of the probability distribution. Indeed, with a fair coin the probability of this experiment outcome is 1/2<sup>5</sup> = 0.031, which would be even lower if the coin were biased in favour of tails. Therefore, the observations are not likely enough for the null hypothesis to hold, and the test refutes it. Since the coin is ostensibly neither fair nor biased toward tails, the conclusion of the experiment is that the coin is biased towards heads. Alternatively, a null hypothesis implying a two-tailed test is "this coin is fair". This one null hypothesis could be examined by looking out for either too many tails or too many heads in the experiments. The outcomes that would tend to refute this null hypothesis are those with a large number of heads or a large number of tails, and our experiment with 5 heads would seem to belong to this class. However, the probability of 5 tosses of the same kind, irrespective of whether these are head or tails, is twice as much as that of the 5-head occurrence singly considered. Hence, under this two-tailed null hypothesis, the observation receives a [[P-value|probability value]] of 0.063. Hence again, with the same significance threshold used for the one-tailed test (0.05), the same outcome is not statistically significant. Therefore, the two-tailed null hypothesis will be preserved in this case, not supporting the conclusion reached with the single-tailed null hypothesis, that the coin is biased towards heads. This example illustrates that the conclusion reached from a statistical test may depend on the precise formulation of the null and alternative hypotheses. ====Discussion==== Fisher said, "the null hypothesis must be exact, that is free of vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution", implying a more restrictive domain for ''H''<sub>0</sub>.<ref>{{cite book |last=Fisher |first=R. A. |year=1966 |title=The Design of Experiments |edition=8th |publisher=Hafner |location=Edinburgh |title-link=The Design of Experiments }}</ref> According to this view, the null hypothesis must be numerically exactβit must state that a particular quantity or difference is equal to a particular number. In classical science, it is most typically the statement that there is ''no effect'' of a particular treatment; in observations, it is typically that there is ''no difference'' between the value of a particular measured variable and that of a prediction. Most statisticians believe that it is valid to state direction as a part of null hypothesis, or as part of a null hypothesis/alternative hypothesis pair.<ref>For example see [http://davidmlane.com/hyperstat/A73079.html Null hypothesis]</ref> However, the results are not a full description of all the results of an experiment, merely a single result tailored to one particular purpose. For example, consider an ''H''<sub>0</sub> that claims the population mean for a new treatment is an improvement on a well-established treatment with population {{nowrap|mean {{=}} 10}} (known from long experience), with the one-tailed alternative being that the new treatment's {{nowrap|mean > 10}}. If the sample evidence obtained through ''x''-bar equals β200 and the corresponding t-test statistic equals β50, the conclusion from the test would be that there is no evidence that the new treatment is better than the existing one: it would not report that it is markedly worse, but that is not what this particular test is looking for. To overcome any possible ambiguity in reporting the result of the test of a null hypothesis, it is best to indicate whether the test was two-sided and, if one-sided, to include the direction of the effect being tested. The statistical theory required to deal with the simple cases of directionality dealt with here, and more complicated ones, makes use of the concept of an [[unbiased test]]. The directionality of hypotheses is not always obvious. The explicit null hypothesis of Fisher's [[Lady tasting tea]] example was that the Lady had no such ability, which led to a symmetric probability distribution. The one-tailed nature of the test resulted from the one-tailed alternate hypothesis (a term not used by Fisher). The null hypothesis became implicitly one-tailed. The logical negation of the Lady's one-tailed claim was also one-tailed. (Claim: Ability > 0; Stated null: Ability = 0; Implicit null: Ability β€ 0). Pure arguments over the use of one-tailed tests are complicated by the variety of tests. Some tests (for instance the Ο<sup>2</sup> goodness of fit test) are inherently one-tailed. Some probability distributions are asymmetric. The traditional tests of 3 or more groups are two-tailed. Advice concerning the use of one-tailed hypotheses has been inconsistent and accepted practice varies among fields.<ref>{{cite journal | last1 = Lombardi | first1 = Celia M. | last2 = Hurlbert| first2 = Stuart H. | title = Misprescription and misuse of one-tailed tests | journal = Austral Ecology | volume = 34 | pages = 447β468 | year = 2009 | issue = 4 | doi = 10.1111/j.1442-9993.2009.01946.x| doi-access = free | bibcode = 2009AusEc..34..447L }} Discusses the merits and historical usage of one-tailed tests in biology at length.</ref> The greatest objection to one-tailed hypotheses is their potential subjectivity. A non-significant result can sometimes be converted to a significant result by the use of a one-tailed hypothesis (as the fair coin test, at the whim of the analyst). The flip side of the argument: One-sided tests are less likely to ignore a real effect. One-tailed tests can suppress the publication of data that differs in sign from predictions. Objectivity was a goal of the developers of statistical tests. It is a common practice to use a one-tailed hypothesis by default. However, "If you do not have a specific direction firmly in mind in advance, use a two-sided alternative. Moreover, some users of statistics argue that we should ''always'' work with the two-sided alternative."<ref name=moore/><ref>{{cite journal | last1 = Bland | first1 = J Martin | last2 = Altman | first2 = Douglas G | title = One and two sided tests of significance | journal = BMJ | volume = 309 | issue = 6949 | page = 248 | date = 23 July 1994 | doi=10.1136/bmj.309.6949.248| pmid = 8069143 | pmc = 2540725 }} With respect to medical statistics: "In general a one sided test is appropriate when a large difference in one direction would lead to the same action as no difference at all. Expectation of a difference in a particular direction is not adequate justification." "Two sided tests should be used unless there is a very good reason for doing otherwise. If one sided tests are to be used the direction of the test must be specified in advance. One sided tests should never be used simply as a device to make a conventionally non-significant difference significant."</ref> One alternative to this advice is to use three-outcome tests. It eliminates the issues surrounding directionality of hypotheses by testing twice, once in each direction and combining the results to produce three possible outcomes.<ref>{{cite journal | last1 = Jones | first1 = Lyle V. | last2 = Tukey | first2 = John W. | s2cid = 14553341 | title = A Sensible Formulation of the Significance Test | journal = Psychological Methods | volume = 5 | number = 4 | pages = 411β414 | year = 2000 | doi = 10.1037/1082-989X.5.4.411 | pmid = 11194204 }} Test results are signed: significant positive effect, significant negative effect or insignificant effect of unknown sign. This is a more nuanced conclusion than that of the two-tailed test. It has the advantages of one-tailed tests without the disadvantages.</ref> Variations on this approach have a history, being suggested perhaps 10 times since 1950.<ref>{{cite journal | last1 = Hurlbert | first1 = S. H. | last2 = Lombardi | first2 = C. M. | title = Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian | journal = Ann. Zool. Fennici | volume = 46 | issue = 5 | pages = 311β349 | year = 2009 | issn = 1797-2450 | doi=10.5735/086.046.0501| s2cid = 9688067 }}</ref> Disagreements over one-tailed tests flow from the philosophy of science. While Fisher was willing to ignore the unlikely case of the Lady guessing all cups of tea incorrectly (which may have been appropriate for the circumstances), medicine believes that a proposed treatment that kills patients is significant in every sense and should be reported and perhaps explained. Poor statistical reporting practices have contributed to disagreements over one-tailed tests. Statistical significance resulting from two-tailed tests is insensitive to the sign of the relationship; Reporting significance alone is inadequate. "The treatment has an effect" is the uninformative result of a two-tailed test. "The treatment has a beneficial effect" is the more informative result of a one-tailed test. "The treatment has an effect, reducing the average length of hospitalization by 1.5 days" is the most informative report, combining a two-tailed significance test result with a numeric estimate of the relationship between treatment and effect. Explicitly reporting a numeric result eliminates a philosophical advantage of a one-tailed test. An underlying issue is the appropriate form of an experimental science without numeric predictive theories: A model of numeric results is more informative than a model of effect signs (positive, negative or unknown) which is more informative than a model of simple significance (non-zero or unknown); in the absence of numeric theory signs may suffice.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)