Editing Likelihood principle (section)

===Experimental design arguments on the likelihood principle===
Unrealized events play a role in some common statistical methods. For example, the result of a [[statistical hypothesis testing|significance test]] depends on the [[p-value|{{mvar|p}}-value]], the probability of a result as extreme or more extreme than the observation, and that probability may depend on the design of the experiment. To the extent that the likelihood principle is accepted, such methods are therefore denied.

Some classical significance tests are not based on the likelihood. The following are a simple and more complicated example of those, using a commonly cited example called ''the [[optional stopping]] problem''.

;Example&nbsp;1 – simple version:
Suppose I tell you that I tossed a coin 12&nbsp;times and in the process observed 3&nbsp;heads. You might make some inference about the probability of heads and whether the coin was fair.

Suppose now I tell that I tossed the coin ''until'' I observed 3&nbsp;heads, and I tossed it 12&nbsp;times. Will you now make some different inference?

The likelihood function is the same in both cases: It is proportional to
:<math>p^3 (1-p)^9 ~</math>.

So according to the ''likelihood principle'', in either case the inference should be the same.

;Example&nbsp;2 – a more elaborated version of the same statistics:
Suppose a number of scientists are assessing the probability of a certain outcome (which we shall call 'success') in experimental trials. Conventional wisdom suggests that if there is no bias towards success or failure then the success probability would be one half. Adam, a scientist, conducted 12&nbsp;trials and obtains 3&nbsp;successes and 9&nbsp;failures. '''One of those successes was the 12th and last observation.''' Then Adam left the lab.

Bill, a colleague in the same lab, continued Adam's  work and published Adam's results, along with a significance test. He tested the [[null hypothesis]] that {{mvar|p}}, the success probability, is equal to a half, versus {{math|''p'' < 0.5}}&nbsp;. If we ignore the information that the third success was the 12th and last observation, the probability of the observed result that out of 12&nbsp;trials 3 or something fewer (i.e. more extreme) were successes, if {{mvar|H}}{{sub|0}} is true, is

:<math>\left[{12 \choose 3}+{12 \choose 2}+{12 \choose 1}+{12 \choose 0}\right]\left({1 \over 2}\right)^{12} ~ </math>,

which is {{math|{{sfrac|299|4096}} {{=}} 7.3%}}&nbsp;. Thus the null hypothesis is not rejected at the 5% significance level if we ignore the knowledge that the third success was the 12th result.

However observe that this first calculation also includes 12 token long sequences that end in tails contrary to the problem statement!

If we redo this calculation we realize the likelihood according to the null hypothesis must be the probability of a fair coin landing 2 or fewer heads on 11 trials multiplied with the probability of the fair coin landing a head for the 12th trial:

:<math>\left[{11 \choose 2}+{11 \choose 1}+{11 \choose 0}\right]\left({1 \over 2}\right)^{11}{1 \over 2} ~ </math>,

which is {{math|{{sfrac|67|2048}}{{sfrac|1|2}} {{=}} {{sfrac|67|4096}} {{=}} 1.64%}}&nbsp;. Now the result ''is'' statistically significant at the {{math|5%}} level.

Charlotte, another scientist, reads Bill's paper and writes a letter, saying that it is possible that Adam kept trying until he obtained 3&nbsp;successes, in which case the probability of needing to conduct 12 or more experiments is given by

:<math>\left[{11 \choose 2}+{11 \choose 1}+{11 \choose 0}\right]\left({1 \over 2}\right)^{11}{1 \over 2} ~ </math>,

which is {{math|{{sfrac|134|4096}}{{sfrac|1|2}} {{=}} 1.64%}}&nbsp;. Now the result ''is'' statistically significant at the {{math|5%}} level. Note that there is no contradiction between the latter two correct analyses; both computations are correct, and result in the same p-value.

To these scientists, whether a result is significant or not does not depend on the design of the experiment, but does on the likelihood (in the sense of the likelihood function) of the parameter value being&nbsp;{{sfrac|1|2}}&nbsp;.

;Summary of the illustrated issues:
Results of this kind are considered by some as arguments against the likelihood principle. For others it exemplifies the value of the likelihood principle and is an argument against significance tests.

Similar themes appear when comparing [[Fisher's exact test]] with [[Pearson's chi-squared test]].