Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Log-normal distribution
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Interval estimates=== {{further|Reference range#Log-normal distribution}} The most efficient way to obtain [[interval estimate]]s when analyzing log-normally distributed data consists of applying the well-known methods based on the normal distribution to logarithmically transformed data and then to back-transform results if appropriate. ====Prediction intervals==== A basic example is given by [[prediction interval]]s: For the normal distribution, the interval <math>[\mu-\sigma,\mu+\sigma]</math> contains approximately two thirds (68%) of the probability (or of a large sample), and <math>[\mu-2\sigma,\mu+2\sigma]</math> contain 95%. Therefore, for a log-normal distribution, * <math>[\mu^*/\sigma^*,\mu^*\cdot\sigma^*]=[\mu^* {}^\times\!\!/ \sigma^*]</math> contains 2/3, and * <math>[\mu^*/(\sigma^*)^2,\mu^*\cdot(\sigma^*)^2] = [\mu^* {}^\times\!\!/ (\sigma^*)^2]</math> contains 95% of the probability. Using estimated parameters, then approximately the same percentages of the data should be contained in these intervals. ====Confidence interval for ''e<sup>ΞΌ</sup>''==== Using the principle, note that a [[confidence interval]] for <math>\mu</math> is <math>[\widehat\mu \pm q \cdot \widehat\mathop{se}]</math>, where <math>\mathop{se} = \widehat\sigma / \sqrt{n}</math> is the standard error and ''q'' is the 97.5% quantile of a [[Student's t-distribution|t distribution]] with ''n-1'' degrees of freedom. Back-transformation leads to a confidence interval for <math>\mu^* = e^\mu</math> (the median), is: <math display="block">[\widehat\mu^* {}^\times\!\!/ (\operatorname{sem}^*)^q]</math> with <math>\operatorname{sem}^*=(\widehat\sigma^*)^{1/\sqrt{n}}</math> ====Confidence interval for {{math|E(''X'')}}==== The literature discusses several options for calculating the [[confidence interval]] for <math>\mu</math> (the mean of the log-normal distribution). These include [[Bootstrapping (statistics)|bootstrap]] as well as various other methods.<ref name = "Olsson2005">Olsson, Ulf. "Confidence intervals for the mean of a log-normal distribution." ''Journal of Statistics Education'' 13.1 (2005).[https://www.tandfonline.com/doi/pdf/10.1080/10691898.2005.11910638 pdf] [https://jse.amstat.org/v13n1/olsson.html html]</ref><ref>user10525, How do I calculate a confidence interval for the mean of a log-normal data set?, URL (version: 2022-12-18): https://stats.stackexchange.com/q/33395</ref> The Cox Method{{efn|The Cox Method was quoted as "personal communication" in Land, 1971,<ref>Land, C. E. (1971), "Confidence intervals for linear functions of the normal mean and variance," Annals of Mathematical Statistics, 42, 1187β1205.</ref> and was also given in CitationZhou and Gao (1997)<ref>Zhou, X-H., and Gao, S. (1997), "Confidence intervals for the log-normal mean," ''Statistics in Medicine'', 16, 783β790.</ref> and Olsson 2005<ref name = "Olsson2005" />{{rp|Section 3.3}}}} proposes to plug-in the estimators <math display="block">\widehat \mu = \frac {\sum_i \ln x_i}{n}, \qquad S^2 = \frac {\sum_i \left( \ln x_i - \widehat \mu \right)^2} {n-1}</math> and use them to construct [[Confidence_interval#Approximate_confidence_intervals|approximate confidence intervals]] in the following way: <math>\mathrm{CI}(\operatorname{E}(X)) : \exp\left(\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)}} \right)</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} We know that {{nowrap|<math>\operatorname{E}(X) = e^{\mu + \frac{\sigma^2}{2}}</math>.}} Also, <math>\widehat \mu</math> is a normal distribution with parameters: <math>\widehat \mu \sim N\left(\mu, \frac{\sigma^2}{n}\right)</math> <math>S^2</math> has a [[chi-squared distribution]], which is [[Chi-squared_distribution#Related_distributions|approximately]] normally distributed (via [[Central limit theorem|CLT]]), with [[Variance#Distribution of the sample variance|parameters]]: {{nowrap|<math>S^2 \dot \sim N\left(\sigma^2, \frac{2\sigma^4}{n-1}\right)</math>.}} Hence, {{nowrap|<math>\frac{S^2}{2} \dot \sim N\left(\frac{\sigma^2}{2}, \frac{\sigma^4}{2(n-1)}\right)</math>.}} Since the sample mean and variance are independent, and the sum of normally distributed variables is [[Normal distribution#Operations on two independent normal variables|also normal]], we get that: <math>\widehat \mu + \frac{S^2}{2} \dot \sim N\left(\mu + \frac{\sigma^2}{2}, \frac{\sigma^2}{n} + \frac{\sigma^4}{2(n-1)}\right)</math> Based on the above, standard [[Normal distribution#Confidence intervals|confidence intervals]] for <math>\mu + \frac{\sigma^2}{2}</math> can be constructed (using a [[Pivotal quantity]]) as: <math>\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)} } </math> And since confidence intervals are preserved for monotonic transformations, we get that: <math>\mathrm{CI}\left(\operatorname{E}[X] = e^{\mu + \frac{\sigma^2}{2}}\right): \exp\left(\hat \mu + \frac{S^2}{2} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S^2}{n} + \frac{S^4}{2(n-1)}} \right)</math> As desired. {{hidden end}} Olsson 2005, proposed a "modified Cox method" by replacing <math>z_{1-\frac{\alpha}{2}}</math> with <math>t_{n-1, 1-\frac{\alpha}{2}}</math>, which seemed to provide better coverage results for small sample sizes.<ref name = "Olsson2005" />{{rp|Section 3.4}} ====Confidence interval for comparing two log normals==== Comparing two log-normal distributions can often be of interest, for example, from a treatment and control group (e.g., in an [[A/B testing|A/B test]]). We have samples from two independent log-normal distributions with parameters <math>(\mu_1, \sigma_1^2)</math> and <math>(\mu_2, \sigma_2^2)</math>, with sample sizes <math>n_1</math> and <math>n_2</math> respectively. Comparing the medians of the two can easily be done by taking the log from each and then constructing straightforward confidence intervals and transforming it back to the exponential scale. <math display="block">\mathrm{CI}(e^{\mu_1-\mu_2}): \exp\left(\hat \mu_1 - \hat \mu_2 \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{S_1^2}{n} + \frac{S_2^2}{n} } \right)</math> These CI are what's often used in epidemiology for calculation the CI for [[relative-risk]] and [[odds-ratio]].<ref>[https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module8-CategoricalData/PH717-Module8-CategoricalData5.html?fbclid=IwY2xjawFeH3JleHRuA2FlbQIxMAABHbmxa15uyyzJuzEwh9PIUr_m2Jsc9NGiPuS6IwfA36Ca5r1wV1EoPEz3MQ_aem_03PRd_jlRfbsnr6xCPkZmw Confidence Intervals for Risk Ratios and Odds Ratios]</ref> The way it is done there is that we have two approximately Normal distributions (e.g., p<sub>1</sub> and p<sub>2</sub>, for RR), and we wish to calculate their ratio.{{efn|The issue is that we don't know how to do it directly, so we take their logs, and then use the [[delta method]] to say that their logs is itself (approximately) normal. This trick allows us to pretend that their exp was log normal, and use that approximation to build the CI. Notice that in the RR case, the median and the mean in the base distribution (i.e., before taking the log), is actually identical (since they are originally normal, and not log normal). For example, <math>\hat p_1 \dot \sim N(p_1, p_1(1-p1)/n)</math> and <math>\ln \hat{p}_1 \dot \sim N(\ln p_1, (1-p1)/(p_1 n))</math> Hence, building a CI based on the log and than back-transform will give us <math>CI(p_1): e^{\ln \hat{p}_1 \pm (1 - \hat{p}_1)/(\hat{p}_1 n))}</math>. So while we expect the CI to be for the median, in this case, it's actually also for the mean in the original distribution. i.e., if the original <math>\hat p_1</math> was log-normal, we'd expect that <math>\operatorname{E}[\hat p_1] = e^{\ln p_1 + \tfrac{1}{2} (1 - p1)/(p_1 n)}</math>. But in practice, we KNOW that <math>\operatorname{E}[\hat p_1] = e^{\ln p_1} = p_1</math>. Hence, the approximation we have is in the second step (of the delta method), but the CI are actually for the expectation (not just the median). This is because we are starting from a base distribution that is normal, and then using another approximation after the log again to normal. This means that a big approximation part of the CI is from the delta method. }} However, the ratio of the expectations (means) of the two samples might also be of interest, while requiring more work to develop. The ratio of their means is: <math display="block">\frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \sigma_1^2 / 2}}{e^{\mu_2 + \sigma_2^2 /2}} = e^{(\mu_1 - \mu_2) + \frac{1}{2} \left(\sigma_1^2 - \sigma_2^2\right)}</math> Plugin in the estimators to each of these parameters yields also a log normal distribution, which means that the Cox Method, discussed above, could similarly be used for this use-case: <math display="block">\mathrm{CI}\left( \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \sigma_1^2 / 2}}{e^{\mu_2 + \sigma_2^2 / 2}} \right): \exp\left(\left(\hat \mu_1 - \hat \mu_2 + \tfrac{1}{2}S_1^2 - \tfrac{1}{2}S_2^2\right) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } \right)</math> {{hidden begin|style=width:100%|ta1=center|border=1px #aaa solid|title=[Proof]}} To construct a confidence interval for this ratio, we first note that <math>\hat \mu_1 - \hat \mu_2</math> follows a normal distribution, and that both <math>S_1^2</math> and <math>S_2^2</math> has a [[chi-squared distribution]], which is [[Chi-squared distribution#Related distributions|approximately]] normally distributed (via [[Central limit theorem|CLT]], with the relevant [[Variance#Distribution of the sample variance|parameters]]). This means that <math display="block">(\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \sim N\left((\mu_1 - \mu_2) + \frac{1}{2}(\sigma_1^2 - \sigma_2^2), \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right)</math> Based on the above, standard [[Normal distribution#Confidence intervals|confidence intervals]] can be constructed (using a [[Pivotal quantity]]) as: <math>(\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } </math> And since confidence intervals are preserved for monotonic transformations, we get that: <math>CI\left( \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} = \frac{e^{\mu_1 + \frac{\sigma_1^2}{2}}}{e^{\mu_2 + \frac{\sigma_2^2}{2}}} \right):e^{\left((\hat \mu_1 - \hat \mu_2 + \frac{1}{2}S_1^2 - \frac{1}{2}S_2^2) \pm z_{1-\frac{\alpha}{2}} \sqrt{ \frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)} } \right)}</math> As desired. {{hidden end}} It's worth noting that naively using the [[Maximum likelihood estimation|MLE]] in the ratio of the two expectations to create a [[ratio estimator]] will lead to a [[Consistency (statistics)|consistent]], yet biased, point-estimation (we use the fact that the estimator of the ratio is a log normal distribution):{{efn|The formula can found by just treating the estimated means and variances as approximetly normal, which indicates the terms is itself a log-normal, enabling us to quickly get the expectation. The bias can be partially minimized by using: <math display="block">\begin{align} \widehat \left[ \frac{\operatorname{E}(X_1)}{\operatorname{E}(X_2)} \right] &= \left[ \frac{\widehat \operatorname{E}(X_1)}{\widehat \operatorname{E}(X_2)} \right] \frac{2}{\widehat \left( \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right)} \\ &\approx \left[e^{(\widehat \mu_1 - \widehat \mu_2) + \frac{1}{2}\left(S_1^2 - S_2^2\right)}\right] \frac{2}{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2} + \frac{S_1^4}{2(n_1-1)} + \frac{S_2^4}{2(n_2-1)}} \end{align} </math>}}{{citation needed|date=December 2024}} <math display="block">\begin{align} \operatorname{E}\left[ \frac{\widehat \operatorname{E}(X_1)}{\widehat \operatorname{E}(X_2)} \right] &= \operatorname{E}\left[\exp\left(\left(\widehat \mu_1 - \widehat \mu_2\right) + \tfrac{1}{2} \left(S_1^2 - S_2^2\right)\right)\right] \\ &\approx \exp\left[{(\mu_1 - \mu_2) + \frac{1}{2}(\sigma_1^2 - \sigma_2^2) + \frac{1}{2}\left( \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} + \frac{\sigma_1^4}{2(n_1-1)} + \frac{\sigma_2^4}{2(n_2-1)} \right) }\right] \end{align} </math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)