Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Log-normal distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Related distributions == * If <math>X \sim \mathcal{N}(\mu, \sigma^2)</math> is a [[normal distribution]], then <math>\exp(X) \sim \operatorname{Lognormal}(\mu, \sigma^2).</math> * If <math>X \sim \operatorname{Lognormal}(\mu, \sigma^2)</math> is distributed log-normally, then <math>\ln X \sim \mathcal{N}(\mu, \sigma^2)</math> is a normal random variable. * Let <math>X_j \sim \operatorname{Lognormal}(\mu_j, \sigma_j^2)</math> be independent log-normally distributed variables with possibly varying <math>\sigma</math> and <math>\mu</math> parameters, and <math display="inline">Y = \sum_{j = 1}^n X_j</math>. The distribution of <math>Y</math> has no closed-form expression, but can be reasonably approximated by another log-normal distribution <math>Z</math> at the right tail.<ref name="Asmussen2">{{cite journal | first1 = S. | last1 = Asmussen | first2 = L. | last2 = Rojas-Nandayapa | title = Asymptotics of Sums of Lognormal Random Variables with Gaussian Copula | journal = Statistics and Probability Letters | volume = 78 | issue = 16 | pages = 2709β2714 | year = 2008 | doi = 10.1016/j.spl.2008.03.035 | url = https://hal.archives-ouvertes.fr/hal-00595951/file/PEER_stage2_10.1016%252Fj.spl.2008.03.035.pdf }}</ref> Its probability density function at the neighborhood of 0 has been characterized<ref name = Gao/> and it does not resemble any log-normal distribution. A commonly used approximation due to L.F. Fenton (but previously stated by R.I. Wilkinson and mathematically justified by Marlow<ref name="Marlow">{{cite journal | first = NA. | last = Marlow | title = A normal limit theorem for power sums of independent normal random variables | journal = Bell System Technical Journal | volume = 46 | issue = 9 | pages = 2081β2089 | date = Nov 1967 | doi = 10.1002/j.1538-7305.1967.tb04244.x}}</ref>) is obtained by matching the mean and variance of another log-normal distribution: <math display="block">\begin{align} \sigma^2_Z &= \ln\!\left[ \frac{\sum_j e^{2\mu_j+\sigma_j^2} \left(e^{\sigma_j^2} - 1\right)}{{\left(\sum_j e^{\mu_j + \sigma_j^2/2}\right)}^2} + 1\right], \\[1ex] \mu_Z &= \ln\!\left[ \sum_j e^{\mu_j+\sigma_j^2/2} \right] - \frac{\sigma^2_Z}{2}. \end{align}</math> In the case that all <math>X_j</math> have the same variance parameter {{nowrap|<math>\sigma_j = \sigma</math>,}} these formulas simplify to <math display="block">\begin{align} \sigma^2_Z &= \ln\!\left[ \left(e^{\sigma^2} - 1\right) \frac{\sum_j e^{2\mu_j}}{{\left(\sum_j e^{\mu_j}\right)}^2} + 1\right], \\[1ex] \mu_Z &= \ln\!\left[ \sum_j e^{\mu_j} \right] + \frac{\sigma^2}{2} - \frac{\sigma^2_Z}{2}. \end{align}</math> For a more accurate approximation, one can use the [[Monte Carlo method]] to estimate the cumulative distribution function, the pdf and the right tail.<ref name="BotLec2017">{{cite conference | title = Accurate computation of the right tail of the sum of dependent log-normal variates | last1 = Botev | first1 = Z. I. | last2 = L'Ecuyer | first2 = P. | date = 2017 | publisher = IEEE | isbn = 978-1-5386-3428-8 | book-title= 2017 Winter Simulation Conference (WSC), 3rdβ6th Dec 2017 | pages = 1880β1890 | location = Las Vegas, NV | doi= 10.1109/WSC.2017.8247924 | arxiv = 1705.03196}} </ref><ref name="AGL2016">{{cite arXiv | last1 = Asmussen | first1 = A. | last2 = Goffard | first2 = P.-O. | last3 = Laub | first3 = P. J. | date = 2016 | title = Orthonormal polynomial expansions and lognormal sum densities | eprint = 1601.01763v1 | class = math.PR }} </ref> The sum of correlated log-normally distributed random variables can also be approximated by a log-normal distribution{{citation needed | date = February 2021}} <math display="block">\begin{align} S_+ &= \operatorname{E}\left[\sum_i X_i \right] = \sum_i \operatorname{E}[X_i] = \sum_i e^{\mu_i + \sigma_i^2/2} \\[2ex] \sigma^2_{Z} &= \frac{1}{S_+^2} \, \sum_{i,j} \operatorname{cor}_{ij} \sigma_i \sigma_j \operatorname{E}[X_i] \operatorname{E}[X_j] \\[1ex] &= \frac{1}{S_+^2} \, \sum_{i,j} \operatorname{cor}_{ij} \sigma_i \sigma_j e^{\mu_i+\sigma_i^2/2} e^{\mu_j+\sigma_j^2/2} \\[2ex] \mu_Z &= \ln S_+ - \sigma_{Z}^2/2 \end{align}</math> * If <math>X \sim \operatorname{Lognormal}(\mu, \sigma^2)</math> then <math>X+c</math> is said to have a ''Three-parameter log-normal'' distribution with support {{nowrap|<math>x\in (c, +\infty)</math>.}}<ref name="Sangal1970">{{cite journal | first1 = B. | last1 = Sangal | first2 = A. | last2 = Biswas | title = The 3-Parameter Lognormal Distribution Applications in Hydrology | journal = Water Resources Research | volume = 6 | issue = 2 | pages = 505β515 | year = 1970 | doi = 10.1029/WR006i002p00505}}</ref> {{nowrap|<math>\operatorname{E}[X+c] = \operatorname{E}[X] + c</math>,}} {{nowrap|<math>\operatorname{Var}[X+c] = \operatorname{Var}[X]</math>.}} * The log-normal distribution is a special case of the semi-bounded [[Johnson's SU-distribution]].<ref name="Johnson1949">{{cite journal |author-link = Norman Lloyd Johnson | last = Johnson | first = N. L. | date = 1949 | title = Systems of Frequency Curves Generated by Methods of Translation | journal=[[Biometrika]] | volume = 36 | issue = 1/2 | pages = 149β176 | jstor = 2332539 | doi = 10.2307/2332539 | pmid = 18132090 }}</ref> * If <math>X\mid Y \sim \operatorname{Rayleigh}(Y)</math> with <math> Y \sim \operatorname{Lognormal}(\mu, \sigma^2)</math>, then <math> X \sim \operatorname{Suzuki}(\mu, \sigma)</math> ([[Suzuki distribution]]). * A substitute for the log-normal whose integral can be expressed in terms of more elementary functions<ref>{{Cite journal | last1 = Swamee | first1 = P. K. | title = Near Lognormal Distribution | doi = 10.1061/(ASCE)1084-0699(2002)7:6(441) | journal = Journal of Hydrologic Engineering | volume = 7 | issue = 6 | pages = 441β444 | year = 2002 }}</ref> can be obtained based on the [[logistic distribution]] to get an approximation for the [[Cumulative distribution function|CDF]] <math display="block"> F(x;\mu,\sigma) = \left[\left(\frac{e^\mu}{x}\right)^{\pi/(\sigma \sqrt{3})} + 1\right]^{-1}.</math> This is a [[log-logistic distribution]]. == Statistical inference == ===Estimation of parameters=== ==== Maximum likelihood estimator ==== For determining the [[maximum likelihood]] estimators of the log-normal distribution parameters {{math|''ΞΌ''}} and {{math|''Ο''}}, we can use the [[normal distribution#Estimation of parameters|same procedure]] as for the [[normal distribution]]. Note that <math display="block">L(\mu, \sigma) = \prod_{i=1}^n \frac 1 {x_i} \varphi_{\mu,\sigma} (\ln x_i),</math> where <math>\varphi</math> is the density function of the normal distribution <math>\mathcal N(\mu,\sigma^2)</math>. Therefore, the log-likelihood function is <math display="block"> \ell (\mu,\sigma \mid x_1, x_2, \ldots, x_n) = - \sum _i \ln x_i + \ell_N (\mu, \sigma \mid \ln x_1, \ln x_2, \dots, \ln x_n).</math> Since the first term is constant with regard to ''ΞΌ'' and ''Ο'', both logarithmic likelihood functions, <math>\ell</math> and <math>\ell_N</math>, reach their maximum with the same <math>\mu</math> and <math>\sigma</math>. Hence, the maximum likelihood estimators are identical to those for a normal distribution for the observations <math>\ln x_1, \ln x_2, \dots, \ln x_n)</math>, <math display="block">\widehat \mu = \frac {\sum_i \ln x_i}{n}, \qquad \widehat \sigma^2 = \frac {\sum_i {\left( \ln x_i - \widehat \mu \right)}^2} {n}.</math> For finite ''n'', the estimator for <math>\mu</math> is unbiased, but the one for <math>\sigma</math> is biased. As for the normal distribution, an unbiased estimator for <math>\sigma</math> can be obtained by replacing the denominator ''n'' by ''n''β1 in the equation for <math>\widehat\sigma^2</math>. From this, the MLE for the expectancy of x is:<ref>Shen, Wei-Hsiung. "Estimation of parameters of a lognormal distribution." ''Taiwanese Journal of Mathematics'' 2.2 (1998): 243β250. [https://projecteuclid.org/journalArticle/Download?urlid=10.11650%2Ftwjm%2F1500406934 pdf] </ref> <math> \widehat{\theta}_\text{MLE} = \widehat{\operatorname{E}[X]}_\text{MLE} = e^{\hat \mu + {\hat{\sigma}^2}/{2}} </math> ==== Method of moments ==== When the individual values <math>x_1, x_2, \ldots, x_n</math> are not available, but the sample's mean <math>\bar x</math> and [[standard deviation]] ''s'' is, then the [[Method of moments (statistics)|method of moments]] can be used. The corresponding parameters are determined by the following formulas, obtained from solving the equations for the expectation <math>\operatorname{E}[X]</math> and variance <math>\operatorname{Var}[X]</math> for <math>\mu</math> and <math>\sigma</math>:<ref>Henry (https://math.stackexchange.com/users/6460/henry), Method of moments estimator for lognormal distribution, URL (version: 2022-01-12): https://math.stackexchange.com/q/4355343</ref> <math display="block"> \begin{align} \mu &= \ln \frac{ \bar x} {\sqrt{1+\widehat\sigma^2/\bar x^2} } , \\[1ex] \sigma^2 &= \ln\left(1 + {\widehat\sigma^2} / \bar x^2 \right). \end{align}</math> ==== Other estimators ==== Other estimators also exist, such as Finney's [[UMVUE]] estimator,<ref>Finney, D. J. "On the distribution of a variate whose logarithm is normally distributed." ''Supplement to the Journal of the Royal Statistical Society'' 7.2 (1941): 155β161.</ref> the "Approximately Minimum Mean Squared Error Estimator", the "Approximately Unbiased Estimator" and "Minimax Estimator",<ref>Longford, Nicholas T. "Inference with the lognormal distribution." ''Journal of Statistical Planning and Inference'' 139.7 (2009): 2329β2340.</ref> also "A Conditional Mean Squared Error Estimator",<ref>Zellner, Arnold. "Bayesian and non-Bayesian analysis of the log-normal distribution and log-normal regression." ''Journal of the American Statistical Association'' 66.334 (1971): 327β330.</ref> and other variations as well.<ref>Tang, Qi. "Comparison of different methods for estimating log-normal means". MS thesis. East Tennessee State University, 2014. [https://www.proquest.com/docview/1547379248?pq-origsite=gscholar&fromopenview=true link]https://dc.etsu.edu/cgi/viewcontent.cgi?article=3728&context=etd#page=12.13 pdf]</ref><ref>Kwon, Yeil. "An alternative method for estimating lognormal means." ''Communications for Statistical Applications and Methods'' 28.4 (2021): 351β368. [http://www.csam.or.kr/journal/view.html?doi=10.29220/CSAM.2021.28.4.351 link]</ref> ===Interval estimates=== {{further|Reference range#Log-normal distribution}} The most efficient way to obtain [[interval estimate]]s when analyzing log-normally distributed data consists of applying the well-known methods based on the normal distribution to logarithmically transformed data and then to back-transform results if appropriate. ====Prediction intervals==== A basic example is given by [[prediction interval]]s: For the normal distribution, the interval <math>[\mu-\sigma,\mu+\sigma]</math> contains approximately two thirds (68%) of the probability (or of a large sample), and <math>[\mu-2\sigma,\mu+2\sigma]</math> contain 95%. Therefore, for a log-normal distribution, * <math>[\mu^*/\sigma^*,\mu^*\cdot\sigma^*]=[\mu^* {}^\times\!\!/ \sigma^*]</math> contains 2/3, and * <math>[\mu^*/(\sigma^*)^2,\mu^*\cdot(\sigma^*)^2] = [\mu^* {}^\times\!\!/ (\sigma^*)^2]</math> contains 95% of the probability. Using estimated parameters, then approximately the same percentages of the data should be contained in these intervals. ====Confidence interval for ''e<sup>ΞΌ</sup>''==== Using the principle, note that a [[confidence interval]] for <math>\mu</math> is <math>[\widehat\mu \pm q \cdot \widehat\mathop{se}]</math>, where <math>\mathop{se} = \widehat\sigma / \sqrt{n}</math> is the standard error and ''q'' is the 97.5% quantile of a [[Student's t-distribution|t distribution]] with ''n-1'' degrees of freedom. Back-transformation leads to a confidence interval for <math>\mu^* = e^\mu</math> (the median), is: <math display="block">[\widehat\mu^* {}^\times\!\!/ (\operatorname{sem}^*)^q]</math> with <math>\operatorname{sem}^*=(\widehat\sigma^*)^{1/\sqrt{n}}</math> ====Confidence interval for {{math|E(''X'')}}==== The literature discusses several options for calculating the [[confidence interval]] for <math>\mu</math> (the mean of the log-normal distribution). These include [[Bootstrapping (statistics)|bootstrap]] as well as various other methods.<ref name = "Olsson2005">Olsson, Ulf. "Confidence intervals for the mean of a log-normal distribution." ''Journal of Statistics Education'' 13.1 (2005).[https://www.tandfonline.com/doi/pdf/10.1080/10691898.2005.11910638 pdf] [https://jse.amstat.org/v13n1/olsson.html html]</ref><ref>user10525, How do I calculate a confidence interval for the mean of a log-normal data set?, URL (version: 2022-12-18): https://stats.stackexchange.com/q/33395</ref> The Cox Method{{efn|The Cox Method was quoted as "personal communication" in Land, 1971,<ref>Land, C. E. (1971), "Confidence intervals for linear functions of the normal mean and variance," Annals of Mathematical Statistics, 42, 1187β1205.</ref> and was also given in CitationZhou and Gao (1997)<ref>Zhou, X-H., and Gao, S. (1997), "Confidence intervals for the log-normal mean," ''Statistics in Medicine'', 16, 783β790.</ref> and Olsson 2005<ref name = "Olsson2005" />{{rp|Section 3.3}}}} proposes to plug-in the estimators <math display="block">\widehat \mu = \frac {\sum_i \ln x_i}{n}, \qquad S^2 = \frac {\sum_i \left( \ln x_i - \widehat \mu \right)^2} {n-1}</math> and use them to construct [[Confidence_interval#Approximate_confidence_intervals|approximate confidence intervals]] in the following way: <math>\mathrm{CI}(\operatorname{E}(X)) : \exp\left(\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)}} \right)</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} We know that {{nowrap|<math>\operatorname{E}(X) = e^{\mu + \frac{\sigma^2}{2}}</math>.}} Also, <math>\widehat \mu</math> is a normal distribution with parameters: <math>\widehat \mu \sim N\left(\mu, \frac{\sigma^2}{n}\right)</math> <math>S^2</math> has a [[chi-squared distribution]], which is [[Chi-squared_distribution#Related_distributions|approximately]] normally distributed (via [[Central limit theorem|CLT]]), with [[Variance#Distribution of the sample variance|parameters]]: {{nowrap|<math>S^2 \dot \sim N\left(\sigma^2, \frac{2\sigma^4}{n-1}\right)</math>.}} Hence, {{nowrap|<math>\frac{S^2}{2} \dot \sim N\left(\frac{\sigma^2}{2}, \frac{\sigma^4}{2(n-1)}\right)</math>.}} Since the sample mean and variance are independent, and the sum of normally distributed variables is [[Normal distribution#Operations on two independent normal variables|also normal]], we get that: <math>\widehat \mu + \frac{S^2}{2} \dot \sim N\left(\mu + \frac{\sigma^2}{2}, \frac{\sigma^2}{n} + \frac{\sigma^4}{2(n-1)}\right)</math> Based on the above, standard [[Normal distribution#Confidence intervals|confidence intervals]] for <math>\mu + \frac{\sigma^2}{2}</math> can be constructed (using a [[Pivotal quantity]]) as: <math>\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)} } </math> And since confidence intervals are preserved for monotonic transformations, we get that: <math>\mathrm{CI}\left(\operatorname{E}[X] = e^{\mu + \frac{\sigma^2}{2}}\right): \exp\left(\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)}} \right)</math> As desired. {{hidden end}} Olsson 2005, proposed a "modified Cox method" by replacing <math>z_{1-\frac{\alpha}{2}}</math> with <math>t_{n-1, 1-\frac{\alpha}{2}}</math>, which seemed to provide better coverage results for small sample sizes.<ref name = "Olsson2005" />{{rp|Section 3.4}} ====Confidence interval for comparing two log normals==== Comparing two log-normal distributions can often be of interest, for example, from a treatment and control group (e.g., in an [[A/B testing|A/B test]]). We have samples from two independent log-normal distributions with parameters <math>(\mu_1, \sigma_1^2)</math> and <math>(\mu_2, \sigma_2^2)</math>, with sample sizes <math>n_1</math> and <math>n_2</math> respectively. Comparing the medians of the two can easily be done by taking the log from each and then constructing straightforward confidence intervals and transforming it back to the exponential scale. <math display="block">\mathrm{CI}(e^{\mu_1-\mu_2}): \exp\left(\hat \mu_1 - \hat \mu_2 \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S_1^2}{n} + \frac{S_2^2}{n} } \right)</math> These CI are what's often used in epidemiology for calculation the CI for [[relative-risk]] and [[odds-ratio]].<ref>[https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module8-CategoricalData/PH717-Module8-CategoricalData5.html?fbclid=IwY2xjawFeH3JleHRuA2FlbQIxMAABHbmxa15uyyzJuzEwh9PIUr_m2Jsc9NGiPuS6IwfA36Ca5r1wV1EoPEz3MQ_aem_03PRd_jlRfbsnr6xCPkZmw Confidence Intervals for Risk Ratios and Odds Ratios]</ref> The way it is done there is that we have two approximately Normal distributions (e.g., p<sub>1</sub> and p<sub>2</sub>, for RR), and we wish to calculate their ratio.{{efn|The issue is that we don't know how to do it directly, so we take their logs, and then use the [[delta method]] to say that their logs is itself (approximately) normal. This trick allows us to pretend that their exp was log normal, and use that approximation to build the CI. Notice that in the RR case, the median and the mean in the base distribution (i.e., before taking the log), is actually identical (since they are originally normal, and not log normal). For example, <math>\hat p_1 \dot \sim N(p_1, p_1(1-p1)/n)</math> and <math>\ln \hat{p}_1 \dot \sim N(\ln p_1, (1-p1)/(p_1 n))</math> Hence, building a CI based on the log and than back-transform will give us <math>CI(p_1): e^{\ln \hat{p}_1 \pm (1 - \hat{p}_1)/(\hat{p}_1 n))}</math>. So while we expect the CI to be for the median, in this case, it's actually also for the mean in the original distribution. i.e., if the original <math>\hat p_1</math> was log-normal, we'd expect that <math>\operatorname{E}[\hat p_1] = e^{\ln p_1 + \tfrac{1}{2} (1 - p1)/(p_1 n)}</math>. But in practice, we KNOW that <math>\operatorname{E}[\hat p_1] = e^{\ln p_1} = p_1</math>. Hence, the approximation we have is in the second step (of the delta method), but the CI are actually for the expectation (not just the median). This is because we are starting from a base distribution that is normal, and then using another approximation after the log again to normal. This means that a big approximation part of the CI is from the delta method. }} However, the ratio of the expectations (means) of the two samples might also be of interest, while requiring more work to develop. The ratio of their means is: <math display="block">\frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \sigma_1^2 / 2}}{e^{\mu_2 + \sigma_2^2 /2}} = e^{(\mu_1 - \mu_2) + \frac{1}{2} \left(\sigma_1^2 - \sigma_2^2\right)}</math> Plugin in the estimators to each of these parameters yields also a log normal distribution, which means that the Cox Method, discussed above, could similarly be used for this use-case: <math display="block">\mathrm{CI}\left( \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \sigma_1^2 / 2}}{e^{\mu_2 + \sigma_2^2 / 2}} \right): \exp\left(\left(\hat \mu_1 - \hat \mu_2 + \tfrac{1}{2}S_1^2 - \tfrac{1}{2}S_2^2\right) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } \right)</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} To construct a confidence interval for this ratio, we first note that <math>\hat \mu_1 - \hat \mu_2</math> follows a normal distribution, and that both <math>S_1^2</math> and <math>S_2^2</math> has a [[chi-squared distribution]], which is [[Chi-squared distribution#Related distributions|approximately]] normally distributed (via [[Central limit theorem|CLT]], with the relevant [[Variance#Distribution of the sample variance|parameters]]). This means that <math display="block">(\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \sim N\left((\mu_1 - \mu_2) + \frac{1}{2}(\sigma_1^2 - \sigma_2^2), \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right)</math> Based on the above, standard [[Normal distribution#Confidence intervals|confidence intervals]] can be constructed (using a [[Pivotal quantity]]) as: <math>(\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } </math> And since confidence intervals are preserved for monotonic transformations, we get that: <math>CI\left( \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \frac{\sigma_1^2}{2}}}{e^{\mu_2 + \frac{\sigma_2^2}{2}}} \right):e^{\left((\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } \right)}</math> As desired. {{hidden end}} It's worth noting that naively using the [[Maximum likelihood estimation|MLE]] in the ratio of the two expectations to create a [[ratio estimator]] will lead to a [[Consistency (statistics)|consistent]], yet biased, point-estimation (we use the fact that the estimator of the ratio is a log normal distribution):{{efn|The formula can found by just treating the estimated means and variances as approximetly normal, which indicates the terms is itself a log-normal, enabling us to quickly get the expectation. The bias can be partially minimized by using: <math display="block">\begin{align} \widehat \left[ \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} \right] &= \left[ \frac{\widehat \operatorname{E}(X_1)}{\widehat \operatorname{E}(X_2)} \right] \frac{2}{\widehat \left( \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right)} \\ &\approx \left[e^{(\widehat \mu_1 - \widehat \mu_2) + \frac{1}{2}\left(S_1^2 - S_2^2\right)}\right] \frac{2}{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)}} \end{align} </math>}}{{citation needed|date=December 2024}} <math display="block">\begin{align} \operatorname{E}\left[ \frac{\widehat \operatorname{E}(X_1)}{\widehat \operatorname{E}(X_2)} \right] &= \operatorname{E}\left[\exp\left(\left(\widehat \mu_1 - \widehat \mu_2\right) + \tfrac{1}{2} \left(S_1^2 - S_2^2\right)\right)\right] \\ &\approx \exp\left[{(\mu_1 - \mu_2) + \frac{1}{2}(\sigma_1^2 - \sigma_2^2) + \frac{1}{2}\left( \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right) }\right] \end{align} </math> === Extremal principle of entropy to fix the free parameter ''Ο''=== In applications, <math>\sigma</math> is a parameter to be determined. For growing processes balanced by production and dissipation, the use of an extremal principle of Shannon entropy shows that<ref name="bai">{{cite journal | last1 = Wu | first1 = Ziniu | last2 = Li | first2 = Juan | last3 = Bai | first3 = Chenyuan | title = Scaling Relations of Lognormal Type Growth Process with an Extremal Principle of Entropy | journal = Entropy | volume = 19 | issue = 56 | year = 2017 | pages = 1β14 | doi = 10.3390/e19020056 | bibcode = 2017Entrp..19...56W | doi-access = free}}</ref> <math display="block">\sigma = \frac 1 \sqrt{6} </math> This value can then be used to give some scaling relation between the inflexion point and maximum point of the log-normal distribution.<ref name = bai/> This relationship is determined by the base of natural logarithm, <math>e = 2.718\ldots</math>, and exhibits some geometrical similarity to the minimal surface energy principle. These scaling relations are useful for predicting a number of growth processes (epidemic spreading, droplet splashing, population growth, swirling rate of the bathtub vortex, distribution of language characters, velocity profile of turbulences, etc.). For example, the log-normal function with such <math>\sigma</math> fits well with the size of secondarily produced droplets during droplet impact <ref name="wu"/> and the spreading of an epidemic disease.<ref name="Wang">{{cite journal | last1 = Wang | first1 = WenBin | last2 = Wu | first2 = ZiNiu | last3 = Wang | first3 = ChunFeng | last4 = Hu | first4 = RuiFeng | title = Modelling the spreading rate of controlled communicable epidemics through an entropy-based thermodynamic model | journal = Science China Physics, Mechanics and Astronomy | volume = 56 | issue = 11 | year = 2013 | pages = 2143β2150 | issn = 1674-7348 | doi = 10.1007/s11433-013-5321-0 | pmid = 32288765 | pmc = 7111546 | arxiv = 1304.5603 | bibcode = 2013SCPMA..56.2143W}}</ref> The value <math display="inline">\sigma = 1 \big/ \sqrt{6}</math> is used to provide a probabilistic solution for the Drake equation.<ref name="Bloetscher">{{cite journal | last1 = Bloetscher | first1 = Frederick | title = Using predictive Bayesian Monte Carlo- Markov Chain methods to provide a probabilistic solution for the Drake equation | journal = Acta Astronautica | volume = 155 | year = 2019 | pages = 118β130 | doi = 10.1016/j.actaastro.2018.11.033 | bibcode = 2019AcAau.155..118B | s2cid = 117598888}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)