Zipf–Mandelbrot law
Template:Short description Template:Probability distribution \frac{1}{(k + q)^s}</math>
|cdf = <math>\frac{H_{k,q,s}}{H_{N,q,s}}</math> |mean = <math>\frac{H_{N,q,s-1}}{H_{N,q,s}} - q</math> |median = |mode = <math>1</math> |variance = |skewness = |kurtosis = |entropy = <math>\frac{s}{H_{N,q,s}} \sum_{k=1}^N \frac{\ln(k + q)}{(k + q)^s} + \ln(H_{N,q,s})</math> |mgf = |char =
}} In probability theory and statistics, the Zipf–Mandelbrot law is a discrete probability distribution. Also known as the Pareto–Zipf law, it is a power-law distribution on ranked data, named after the linguist George Kingsley Zipf, who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot, who subsequently generalized it.
The probability mass function is given by
- <math>f(k; N, q, s) = \frac{1}{H_{N,q,s}} \frac{1}{(k + q)^s},</math>
where <math>H_{N,q,s}</math> is given by
- <math>H_{N,q,s} = \sum_{i=1}^N \frac{1}{(i + q)^s},</math>
which may be thought of as a generalization of a harmonic number. In the formula, <math>k</math> is the rank of the data, and <math>q</math> and <math>s</math> are parameters of the distribution. In the limit as <math>N</math> approaches infinity, this becomes the Hurwitz zeta function <math>\zeta(s, q)</math>. For finite <math>N</math> and <math>q = 0</math> the Zipf–Mandelbrot law becomes Zipf's law. For infinite <math>N</math> and <math>q = 0</math> it becomes a zeta distribution.
ApplicationsEdit
The distribution of words ranked by their frequency in a random text corpus is approximated by a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a moderately sized corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Powers, 1998 and Gelbukh & Sidorov, 2001). Zipf's law implicitly assumes a fixed vocabulary size, but the Harmonic series with s = 1 does not converge, while the Zipf–Mandelbrot generalization with s > 1 does. Furthermore, there is evidence that the closed class of functional words that define a language obeys a Zipf–Mandelbrot distribution with different parameters from the open classes of contentive words that vary by topic, field and register.<ref>Template:Cite conference</ref>
In ecological field studies, the relative abundance distribution (i.e. the graph of the number of species observed as a function of their abundance) is often found to conform to a Zipf–Mandelbrot law.<ref>Template:Cite journal</ref>
Within music, many metrics of measuring "pleasing" music conform to Zipf–Mandelbrot distributions.<ref>Template:Cite journal</ref>
NotesEdit
ReferencesEdit
- Template:Cite book Reprinted as
- Template:Cite conference
- Template:Cite book
- Van Droogenbroeck F. J. (2019). "An essential rephrasing of the Zipf–Mandelbrot law to solve authorship attribution applications by Gaussian statistics".