Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Hypergeometric distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Tail bounds=== Let <math>X \sim \operatorname{Hypergeometric}(N,K,n)</math> and <math>p=K/N</math>. Then for <math> 0 < t < K/N</math> we can derive the following bounds:<ref name=":0">{{citation | last = Hoeffding | first = Wassily | journal = [[Journal of the American Statistical Association]] | volume= 58 | number= 301 | pages=13β30 | title = Probability inequalities for sums of bounded random variables | year = 1963 | doi=10.2307/2282952| jstor = 2282952 | url = http://repository.lib.ncsu.edu/bitstream/1840.4/2170/1/ISMS_1962_326.pdf }}.</ref> : <math>\begin{align} \Pr[X\le (p - t)n] &\le e^{-n\text{D}(p-t\parallel p)} \le e^{-2t^2n}\\ \Pr[X\ge (p+t)n] &\le e^{-n\text{D}(p+t\parallel p)} \le e^{-2t^2n}\\ \end{align}\!</math> where : <math> D(a\parallel b)=a\log\frac{a}{b}+(1-a)\log\frac{1-a}{1-b}</math> is the [[Kullback-Leibler divergence]] and it is used that <math>D(a\parallel b) \ge 2(a-b)^2</math>.<ref name="wordpress.com">{{cite web|url=https://ahlenotes.wordpress.com/2015/12/08/hypergeometric_tail/|title=Another Tail of the Hypergeometric Distribution|date=8 December 2015|website=wordpress.com|access-date=19 March 2018}}</ref> '''Note''': In order to derive the previous bounds, one has to start by observing that <math>X = \frac{\sum_{i=1}^n Y_i}{n}</math> where <math>Y_i</math> are ''dependent'' random variables with a specific distribution <math>D</math>. Because most of the theorems about bounds in sum of random variables are concerned with ''independent'' sequences of them, one has to first create a sequence <math>Z_i</math> of ''independent'' random variables with the same distribution <math>D</math> and apply the theorems on <math>X' = \frac{\sum_{i=1}^{n}Z_i}{n}</math>. Then, it is proved from Hoeffding <ref name=":0" /> that the results and bounds obtained via this process hold for <math>X</math> as well. If ''n'' is larger than ''N''/2, it can be useful to apply symmetry to "invert" the bounds, which give you the following: <ref name="wordpress.com"/> <ref>{{citation | last = Serfling | first = Robert | journal = [[The Annals of Statistics]] | volume = 2 | pages = 39β48 | title = Probability inequalities for the sum in sampling without replacement | year = 1974| issue = 1 | doi = 10.1214/aos/1176342611 | doi-access = free }}.</ref> : <math>\begin{align} \Pr[X\le (p - t)n] &\le e^{-(N-n)\text{D}(p+\tfrac{tn}{N-n}||p)} \le e^{-2 t^2 n \tfrac{n}{N-n}}\\ \\ \Pr[X\ge (p+t)n] &\le e^{-(N-n)\text{D}(p-\tfrac{tn}{N-n}||p)} \le e^{-2 t^2 n \tfrac{n}{N-n}}\\ \end{align}\!</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)