Editing Hypergeometric distribution (section)

==Occurrence and applications==

===Application to auditing elections===
[[File:Election Samples.png|thumb|Samples used for election audits and resulting chance of missing a problem]]

[[Election audits]] typically test a sample of machine-counted precincts to see if recounts by hand or machine match the original counts. Mismatches result in either a report or a larger recount. The sampling rates are usually defined by law, not statistical design, so for a legally defined sample size {{mvar|n}}, what is the probability of missing a problem which is present in {{mvar|K}} precincts, such as a hack or bug? This is the probability that {{math|''k'' {{=}} 0 .}} Bugs are often obscure, and a hacker can minimize detection by affecting only a few precincts, which will still affect close elections, so a plausible scenario is for {{mvar|K}} to be on the order of 5% of {{mvar|N}}. Audits typically cover 1% to 10% of precincts (often 3%),<ref name=newyorkaudit>{{cite SSRN |first1=Amanda |last1=Glazer |first2=Jacob |last2=Spertus |date=2020-02-10 |df=dmy-all |title=Start spreading the news: New York's post-election audit has major flaws |ssrn=3536011}}</ref><ref name=vvstates>{{cite web |title=State audit laws |date=2017-02-10 |df=dmy-all |website=Verified Voting |lang=en-US |url=https://www.verifiedvoting.org/state-audit-laws/ |access-date=2018-04-02 |archive-date=2020-01-04 |archive-url=https://web.archive.org/web/20200104201852/https://www.verifiedvoting.org/state-audit-laws/ }}</ref><ref name=ncsl>{{cite web |title=Post-election audits |publisher=National Conference of State Legislatures |website=ncsl.org |lang=en-US |url=http://www.ncsl.org/research/elections-and-campaigns/post-election-audits635926066.aspx#state |access-date=2018-04-02 |df=dmy-all}}</ref> so they have a high chance of missing a problem. For example, if a problem is present in 5 of 100 precincts, a 3% sample has 86% probability that {{nobr| {{math| ''k'' {{=}} 0 }} }} so the problem would not be noticed, and only 14% probability of the problem appearing in the sample (positive {{mvar| k }}):
: <math>
\begin{align}
\operatorname{\boldsymbol\mathcal P}\{\ X = 0\ \} & = \frac{\ \left[\ \binom{\text{Hack}}{0} \binom{ N\ -\ \text{Hack}}{ n\ -\ 0 }\ \right]\ }{\left[\ \binom{N}{n}\ \right]} =  \frac{\ \left[\ \binom{N\ -\ \text{Hack}}{n}\ \right]}{\ \left[\ \binom{N}{n}\ \right]\ } = \frac{\ \left[\ \frac{\ (N\ -\ \text{Hack})!\ }{n!(N\ -\ \text{Hack}-n)!}\ \right]\ }{\left[\ \frac{N!}{n!(N\ -\ n)!}\ \right]} = \frac{\ \left[\ \frac{(N-\text{Hack})!}{(N\ -\ \text{Hack}\ -\ n)!}\ \right]\ }{\left[\ \frac{N!}{(N\ -\ n)!}\ \right]} \\[8pt]
& =  \frac{\ \left[\ \binom{100-5}{3}\ \right]\ }{\ \left[\ \binom{100}{3}\ \right]\ } = \frac{\ \left[\ \frac{(100-5)!}{(100-5-3)!}\ \right]\ }{\left[\ \frac{100!}{(100-3)!}\ \right]} = \frac{\ \left[\ \frac{95!}{92!}\ \right]\ }{\ \left[\ \frac{100!}{97!}\ \right]\ } = \frac{\ 95\times94\times93\ }{100\times99\times98} = 86\%
\end{align}
</math>

The sample would need 45 precincts in order to have probability under 5% that ''k''&nbsp;=&nbsp;0 in the sample, and thus have probability over 95% of finding the problem:
: <math>\operatorname{\boldsymbol\mathcal P}\{\ X = 0\ \} =  \frac{\ \left[\ \binom{100-5}{45}\ \right]\ }{\left[\  \binom{100}{45}\ \right]} = \frac{\ \left[\ \frac{95!}{50!}\ \right]\ }{\left[\ \frac{100!}{55!}\ \right]} = \frac{\ 95\times 94\times \cdots \times 51\ }{\ 100\times 99\times \cdots \times 56\ } = \frac{\ 55\times 54\times 53\times 52\times 51\ }{\ 100\times 99\times 98\times 97\times 96\ } = 4.6\% ~.</math>

=== Application to Texas hold'em poker ===
In [[hold'em]] poker players make the best hand they can combining the two cards in their hand with the 5 cards (community cards) eventually turned up on the table. The deck has 52 and there are 13 of each suit.
For this example assume a player has 2 clubs in the hand and there are 3 cards showing on the table, 2 of which are also clubs. The player would like to know the probability of one of the next 2 cards to be shown being a club to complete the [[Flush (poker)|flush]].<br />
(Note that the probability calculated in this example assumes no information is known about the cards in the other players' hands; however, experienced poker players may consider how the other players place their bets (check, call, raise, or fold) in considering the probability for each scenario. Strictly speaking, the approach to calculating success probabilities outlined here is accurate in a scenario where there is just one player at the table; in a multiplayer game this probability might be adjusted somewhat based on the betting play of the opponents.)

There are 4 clubs showing so there are 9 clubs still unseen. There are 5 cards showing (2 in the hand and 3 on the table) so there are <math>52-5=47</math> still unseen.

The probability that one of the next two cards turned is a club can be calculated using hypergeometric with <math>k=1, n=2, K=9</math> and <math>N=47</math>. (about 31.64%)

The probability that both of the next two cards turned are clubs can be calculated using hypergeometric with <math>k=2, n=2, K=9</math> and <math>N=47</math>. (about 3.33%)

The probability that neither of the next two cards turned are clubs can be calculated using hypergeometric with <math>k=0, n=2, K=9</math> and <math>N=47</math>. (about 65.03%)

=== Application to Keno ===
The hypergeometric distribution is indispensable for calculating [[Keno]] odds. In Keno, 20 balls are randomly drawn from a collection of 80 numbered balls in a container, rather like [[Bingo (American version)|American Bingo]]. Prior to each draw, a player selects a certain number of ''spots'' by marking a paper form supplied for this purpose. For example, a player might ''play a 6-spot'' by marking 6 numbers, each from a range of 1 through 80 inclusive. Then (after all players have taken their forms to a cashier and been given a duplicate of their marked form, and paid their wager) 20 balls are drawn. Some of the balls drawn may match some or all of the balls selected by the player. Generally speaking, the more ''hits'' (balls drawn that match player numbers selected) the greater the payoff.

For example, if a customer bets ("plays") $1 for a 6-spot (not an uncommon example) and hits 4 out of the 6, the casino would pay out $4. Payouts can vary from one casino to the next, but $4 is a typical value here. The probability of this event is:
:<math> P(X=4) = f(4;80,6,20) = {{{6 \choose 4} {{80-6} \choose {20-4}}}\over {80 \choose 20}} \approx 0.02853791</math>

Similarly, the chance for hitting 5 spots out of 6 selected is
<math> {{{6 \choose 5} {{74} \choose {15}}} \over {80 \choose 20}} \approx 0.003095639</math>
while a typical payout might be $88. The payout for hitting all 6 would be around $1500 (probability ≈ 0.000128985 or 7752-to-1). The only other nonzero payout might be $1 for hitting 3 numbers (i.e., you get your bet back), which has a probability near 0.129819548.

Taking the sum of products of payouts times corresponding probabilities we get an expected return of 0.70986492 or roughly 71% for a 6-spot, for a house advantage of 29%. Other spots-played have a similar expected return. This very poor return (for the player) is usually explained by the large overhead (floor space, equipment, personnel) required for the game.