Editing Yule–Simon distribution (section)

==Occurrence==

The Yule–Simon distribution arose originally as the limiting distribution of a particular model studied by Udny Yule in 1925 to analyze the growth in the number of species per genus in some higher taxa of biotic organisms.<ref name=YulePhilTrans>{{cite journal
  | last = Yule
  | first = G. U.
  | title = A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S
  | journal = [[Philosophical Transactions of the Royal Society B]]
  | volume = 213
  | pages = 21&ndash;87
  | year = 1924
  | doi = 10.1098/rstb.1925.0002
  | issue = 402–410
| doi-access = free
  }}</ref>  The Yule model makes use of two related Yule processes, where a Yule process is defined as a continuous time [[birth process]] which starts with one or more individuals. Yule proved that when time goes to infinity, the limit distribution of the number of species in a genus selected uniformly at random has a specific form and exhibits a power-law behavior in its tail. Thirty years later, the Nobel laureate Herbert A. Simon proposed a time-discrete preferential attachment model to describe the appearance of new words in a large piece of a text. Interestingly enough, the limit distribution of the number of occurrences of each word, when the number of words diverges, coincides with that of the number of species belonging to the randomly chosen genus in the Yule model, '''for a specific choice of the parameters'''. This fact explains the designation Yule–Simon distribution that is commonly assigned to that limit distribution.  In the context of random graphs, the [[Barabási–Albert model]] also exhibits an asymptotic degree distribution that equals the Yule–Simon distribution in correspondence of a specific choice of the parameters and still presents power-law characteristics for more general choices of the parameters.  The same happens also for other [[preferential attachment]] random graph models.<ref name=Pachn2015RandomGA>{{cite journal
  | title= Random Graphs Associated to Some Discrete and Continuous Time Preferential Attachment Models
  | last1 = Pachon
  | first1= Angelica
  | last2= Polito
  | first2= Federico
  | last3= Sacerdote
  | first3= Laura
  | journal=[[Journal of Statistical Physics]]
  | year= 2015
  | volume= 162
  | issue = 6
 | pages = 1608–1638
 | doi= 10.1007/s10955-016-1462-7
| arxiv= 1503.06150
  | s2cid = 119168040
 }}</ref>

The preferential attachment process can also be studied as an [[urn problem|urn process]] in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number (of balls) the urn already contains.

The distribution also arises as a [[compound distribution]], in which the parameter of a [[geometric distribution]] is treated as a function of random variable having an [[exponential distribution]].{{citation needed|date=July 2012}}  Specifically, assume that <math>W</math> follows an exponential distribution with [[scale parameter|scale]] <math>1/\rho</math> or rate <math>\rho</math>:

:<math>W \sim \operatorname{Exponential}(\rho),</math>

with density

:<math>h(w;\rho) = \rho \exp(-\rho w).</math>

Then a Yule–Simon distributed variable ''K'' has the following geometric distribution conditional on ''W'':

: <math>K \sim \operatorname{Geometric}(\exp(-W)).</math>

The pmf of a geometric distribution is

:<math>g(k; p) = p (1-p)^{k-1}</math>

for <math>k\in\{1,2,\dotsc\}</math>.  The Yule–Simon pmf is then the following exponential-geometric compound distribution:

:<math>f(k;\rho)
 = \int_0^\infty g(k;\exp(-w)) h(w;\rho)\,dw.
</math>

The [[maximum likelihood estimator]] for the parameter <math>\rho </math> given the observations <math>k_1,k_2,k_3,\dots,k_N</math> is the solution to the fixed point equation

:<math>
 \rho^{(t+1)} = \frac{N+a-1}{b+\sum_{i=1}^N\sum_{j=1}^{k_i}\frac{1}{\rho^{(t)} + j}},
</math>
where <math> b=0, a=1</math> are the rate and shape parameters of the [[gamma distribution]] prior on <math> \rho </math>.

This algorithm is derived by Garcia<ref name=JMGGarcia /> by directly optimizing the likelihood. Roberts and Roberts<ref name=RobertsandRoberts>{{cite arXiv
| last1 = Roberts 
| first1 = Lucas
| last2 = Roberts 
| first2 = Denisa
| title = An Expectation Maximization Framework for Preferential Attachment Models
| eprint=1710.08511
| year = 2017
| class = stat.CO
}}</ref>

generalize the algorithm to [[Bayesian probability|Bayesian]] settings with the compound geometric formulation described above. Additionally, Roberts and Roberts<ref name=RobertsandRoberts/> are able to use the [[Expectation Maximisation]] (EM) framework to show convergence of the fixed point algorithm.  Moreover, Roberts and Roberts<ref name=RobertsandRoberts/> derive the sub-linearity of the convergence rate for the fixed point algorithm. Additionally, they use the EM formulation to give 2 alternate derivations of the standard error of the estimator from the fixed point equation. The variance of the <math> \lambda </math> estimator is

:<math>
\operatorname{Var}(\hat{\lambda}) = \frac{1}{\frac{N}{\hat{\lambda}^2} - \sum_{i=1}^N\sum_{j=1}^{k_i}\frac{1}{(\hat{\lambda} + j)^2}},
</math> 
the [[standard error]] is the square root of the quantity of this estimate divided by N.