Editing Reference range (section)

===Establishment methods===
Methods for establishing reference ranges can be based on assuming a [[normal distribution]] or a [[log-normal distribution]], or directly from percentages of interest, as detailed respectively in following sections.  When establishing reference ranges from bilateral organs (e.g., vision or hearing), both results from the same individual can be used, although intra-subject correlation must be taken into account.<ref>{{cite journal |last1=Davis |first1=C.Q. |last2=Hamilton |first2=R. |title=Reference ranges for clinical electrophysiology of vision |journal=Doc Ophthalmol |date=2021 |volume=143 |issue=2 |pages=155–170 |doi=10.1007/s10633-021-09831-1|pmc=8494724 |pmid=33880667 |doi-access=free }}</ref>

====Normal distribution====
{{further|68–95–99.7 rule}}
[[File:Standard deviation diagram.svg|thumb|350px|When assuming a normal distribution, the reference range is obtained by measuring the values in a reference group and taking two standard deviations either side of the mean. This encompasses ~95% of the total population.]]

The 95% interval, is often estimated by assuming a [[normal distribution]] of the measured parameter, in which case it can be defined as the interval limited by 1.96<ref name=MedicalStatistics>Page 48 in: {{cite book |author1=Sterne, Jonathan |author2=Kirkwood, Betty R. |title=Essential medical statistics |publisher=Blackwell Science |location=Oxford |year=2003 |isbn=978-0-86542-871-3 |url-access=registration |url=https://archive.org/details/essentialmedical00kirk }}</ref> (often rounded up to 2) population [[standard deviation]]s from either side of the population mean (also called the [[expected value]]).
However, in the real world, neither the population mean nor the population standard deviation are known. They both need to be estimated from a sample, whose size can be designated ''n''. The population standard deviation is estimated by the sample standard deviation and the population mean is estimated by the sample mean (also called mean or [[arithmetic mean]]). To account for these estimations, the 95% [[prediction interval]] (95% PI) is calculated as:

: {{math|1= 95% PI = mean ± ''t''{{sub|0.975,''n''&minus;1}}·{{sqrt|(''n''+1)/''n''}}·sd}},

where <math>t_{0.975,n-1}</math> is the 97.5% quantile of a [[Student's t-distribution]] with ''n''&minus;1 [[Degrees of freedom (statistics)|degrees of freedom]].

When the sample size is large (''n''≥30) <math>t_{0.975,n-1}\simeq 2.</math>

This method is often acceptably accurate if the standard deviation, as compared to the mean, is not very large. A more accurate method is to perform the calculations on logarithmized values, as described in separate section later.

The following example of this (''not'' logarithmized) method is based on values of [[fasting plasma glucose]] taken from a reference group of 12 subjects:<ref name=Keevil1998>[http://www.clinchem.org/cgi/content-nw/full/44/7/1535/T1 Table 1. Subject characteristics] in: {{Cite journal | last1 = Keevil | first1 = B. G. | last2 = Kilpatrick | first2 = E. S. | last3 = Nichols | first3 = S. P. | last4 = Maylor | first4 = P. W. | title = Biological variation of cystatin C: Implications for the assessment of glomerular filtration rate | journal = Clinical Chemistry | volume = 44 | issue = 7 | pages = 1535–1539 | year = 1998 | doi = 10.1093/clinchem/44.7.1535 | pmid = 9665434| doi-access = free }}</ref>

{|class="wikitable"
|-
! !! [[Fasting plasma glucose]]<br> (FPG) <br>in mmol/L !! Deviation from<br> mean ''m'' !! Squared deviation<br>from mean ''m''
|-
| Subject 1 || 5.5 || 0.17 || 0.029
|-
| Subject 2 || 5.2 || -0.13 || 0.017
|-
| Subject 3 || 5.2 || -0.13 || 0.017
|-
| Subject 4 || 5.8 || 0.47 || 0.221
|-
| Subject 5 || 5.6 || 0.27 || 0.073
|-
| Subject 6 || 4.6 || -0.73 || 0.533
|-
| Subject 7 || 5.6 || 0.27 || 0.073
|-
| Subject 8 || 5.9 || 0.57 || 0.325
|-
| Subject 9 || 4.7 || -0.63 || 0.397
|-
| Subject 10 || 5 || -0.33 || 0.109
|-
| Subject 11 || 5.7 || 0.37 || 0.137
|-
| Subject 12 || 5.2 || -0.13 || 0.017
|-
| || '''Mean = 5.33''' (''m'') <br> ''n''=12 || Mean = 0.00 || Sum/(''n''&minus;1) = 1.95/11 =0.18 <br> <math>
    \sqrt{0.18 } = 0.42 </math><br>= '''standard deviation (s.d.)'''
|}

As can be given from, for example, a [[Student's t-distribution#Table of selected values|table of selected values of Student's t-distribution]], the 97.5% percentile with (12-1) degrees of freedom corresponds to 
<math>t_{0.975,11} = 2.20</math>
 
Subsequently, the lower and upper limits of the standard reference range are calculated as:
:<math> Lower~limit = m - t_{0.975,11} \times\sqrt{\frac{n+1}{n}}\times s.d. = 5.33 - 2.20\times\sqrt{\frac{13}{12}} \times 0.42 = 4.4</math>

:<math> Upper~limit = m + t_{0.975,11} \times\sqrt{\frac{n+1}{n}}\times s.d. = 5.33 + 2.20\times\sqrt{\frac{13}{12}} \times 0.42 = 6.3.</math>

Thus, the standard reference range for this example is estimated to be 4.4 to 6.3&nbsp;mmol/L.

=====Confidence interval of limit=====
The 90% ''confidence interval of a standard reference range limit'' as estimated assuming a normal distribution can be calculated by:<ref>[https://books.google.com/books?id=p7XwAwAAQBAJ&pg=PA65 Page 65] in: {{cite book|title=Tietz Fundamentals of Clinical Chemistry and Molecular Diagnostics|author=Carl A. Burtis, David E. Bruns|edition=7|publisher=Elsevier Health Sciences|year=2014|isbn=9780323292061}}</ref>

: Lower limit of the confidence interval = percentile limit - 2.81 × {{frac|''SD''|{{sqrt|''n''}}}}

: Upper limit of the confidence interval = percentile limit + 2.81 × {{frac|''SD''|{{sqrt|''n''}}}},

where SD is the standard deviation, and n is the number of samples.

Taking the example from the previous section, the number of samples is 12 and the standard deviation is 0.42&nbsp;mmol/L, resulting in:

:''Lower limit of the confidence interval'' of the ''lower limit of the standard reference range'' = 4.4 - 2.81 × {{frac|0.42|{{sqrt|12}}}} ≈ 4.1

:''Upper limit of the confidence interval'' of the ''lower limit of the standard reference range'' = 4.4 + 2.81  × {{frac|0.42|{{sqrt|12}}}} ≈ 4.7

Thus, the lower limit of the reference range can be written as 4.4 (90% CI 4.1–4.7) mmol/L.

Likewise, with similar calculations, the upper limit of the reference range can be written as 6.3 (90% CI 6.0–6.6) mmol/L.

These confidence intervals reflect [[random error]], but do not compensate for [[systematic error]], which in this case can arise from, for example, the reference group not having fasted long enough before blood sampling.

As a comparison, actual reference ranges used clinically for fasting plasma glucose are estimated to have a lower limit of approximately 3.8<ref name=firstaid>Last page of {{cite book |author1=Deepak A. Rao |author2=Le, Tao |author3=Bhushan, Vikas |title=First Aid for the USMLE Step 1 2008 (First Aid for the Usmle Step 1) |publisher=McGraw-Hill Medical |year=2007 |isbn=978-0-07-149868-5 |url-access=registration |url=https://archive.org/details/firstaidforusmle00taol }}</ref> to 4.0,<ref name=uppsala>Reference range list from Uppsala University Hospital ("Laborationslista"). Artnr 40284 Sj74a. Issued on April 22, 2008</ref> and an upper limit of approximately 6.0<ref name=uppsala/> to 6.1.<ref name=Medline-GTT>{{MedlinePlusEncyclopedia|003466|Glucose tolerance test}}</ref>

====Log-normal distribution====
[[Image:PDF-log normal distributions.svg|thumb|Some functions of [[log-normal distribution]] (here shown with the measurements non-logarithmized), with the same means - ''μ'' (as calculated after logarithmizing) but different standard deviations - ''σ'' (after logarithmizing)]]
In reality, biological parameters tend to have a [[log-normal distribution]],<ref>{{cite book  | last = Huxley | first = Julian S.  | year = 1932  | title = Problems of relative growth  | publisher = London  | oclc = 476909537  | isbn = 978-0-486-61114-3  }}</ref> rather than the normal distribution or Gaussian distribution.

An explanation for this log-normal distribution for biological parameters is: The event where a sample has half the value of the mean or median tends to have almost equal probability to occur as the event where a sample has twice the value of the mean or median. Also, only a log-normal distribution can compensate for the inability of almost all biological parameters to be of [[negative number]]s (at least when measured on [[absolute scale]]s), with the consequence that there is no definite limit to the size of outliers (extreme values) on the high side, but, on the other hand, they can never be less than zero, resulting in a positive [[skewness]].

As shown in diagram at right, this phenomenon has relatively small effect if the standard deviation (as compared to the mean) is relatively small, as it makes the log-normal distribution appear similar to a normal distribution. Thus, the normal distribution may be more appropriate to use with small standard deviations for convenience, and the log-normal distribution with large standard deviations.

In a log-normal distribution, the [[geometric standard deviation]]s and [[geometric mean]] more accurately estimate the 95% prediction interval than their arithmetic counterparts.

=====Necessity=====
Reference ranges for substances that are usually within relatively narrow limits (coefficient of variation less than 0.213, as detailed below) such as [[electrolytes]] can be estimated by assuming normal distribution, whereas reference ranges for those that vary significantly (coefficient of variation generally over 0.213) such as most [[hormones]]<ref name="pmid19758299">{{cite journal| author=Levitt H, Smith KG, Rosner MH| title=Variability in calcium, phosphorus, and parathyroid hormone in patients on hemodialysis. | journal=Hemodial Int | year= 2009 | volume= 13 | issue= 4 | pages= 518–25 | pmid=19758299 | doi=10.1111/j.1542-4758.2009.00393.x | pmc= | s2cid=24963421 | url=https://pubmed.ncbi.nlm.nih.gov/19758299  }}</ref> are more accurately established by log-normal distribution.

The necessity to establish a reference range by log-normal distribution rather than normal distribution can be regarded as depending on how much difference it would make to ''not'' do so, which can be described as the  ratio:

:{{math|1=Difference ratio = {{sfrac| {{mabs| Limit{{sub|log-normal}} - Limit{{sub|normal}} }} | Limit{{sub|log-normal}} }} }}

where:
* ''Limit<sub>log-normal</sub>'' is the (lower or upper) limit as estimated by assuming log-normal distribution
* ''Limit<sub>normal</sub>'' is the (lower or upper) limit as estimated by assuming normal distribution.

[[File:Diagram of coefficient of variation versus deviation in reference ranges erroneously not established by log-normal distribution.png|thumb|350px|Coefficient of variation versus deviation in reference ranges established by assuming normal distribution when there is actually a log-normal distribution.]]

This difference can be put solely in relation to the [[coefficient of variation]], as in the diagram at right, where:

:{{math|1=Coefficient of variation = {{sfrac|s.d.|m}}}}

where:
* ''s.d.'' is the standard deviation
* ''m'' is the arithmetic mean

In practice, it can be regarded as necessary to use the establishment methods of a log-normal distribution if the difference ratio becomes more than 0.1, meaning that a (lower or upper) limit estimated from an assumed normal distribution would be more than 10% different from the corresponding limit as estimated from a (more accurate) log-normal distribution. As seen in the diagram, a difference ratio of 0.1 is reached for the lower limit at a coefficient of variation of 0.213 (or 21.3%), and for the upper limit at a coefficient of variation at 0.413 (41.3%). The lower limit is more affected by increasing coefficient of variation, and its "critical" coefficient of variation of 0.213 corresponds to a ratio of (upper limit)/(lower limit) of 2.43, so as a rule of thumb, if the upper limit is more than 2.4 times the lower limit when estimated by assuming normal distribution, then it should be considered to do the calculations again by log-normal distribution.

Taking the example from previous section, the standard deviation (s.d.) is estimated at 0.42 and the arithmetic mean (m) is estimated at 5.33. Thus the coefficient of variation is 0.079. This is less than both 0.213 and 0.413, and thus both the lower and upper limit of fasting blood glucose can most likely be estimated by assuming normal distribution. More specifically, the coefficient of variation of 0.079 corresponds to a difference ratio of 0.01 (1%) for the lower limit and 0.007 (0.7%) for the upper limit.

=====From logarithmized sample values=====
A method to estimate the reference range for a parameter with log-normal distribution is to logarithmize all the measurements with an arbitrary [[base of a logarithm|base]] (for example [[e (mathematical constant)|''e'']]), derive the mean and standard deviation of these logarithms, determine the logarithms located (for a 95% prediction interval) 1.96 standard deviations below and above that mean, and subsequently [[exponentiation|exponentiate]] using those two logarithms as exponents and using the same base as was used in logarithmizing, with the two resultant values being the lower and upper limit of the 95% prediction interval.

The following example of this method is based on the same values of [[fasting plasma glucose]] as used in the previous section, using [[e (mathematical constant)|''e'']] as a [[base of a logarithm|base]]:<ref name=Keevil1998/>

{|class="wikitable"
|-
! !! [[Fasting plasma glucose]]<br> (FPG) <br>in mmol/L !! log<sub>[[e (mathematical constant)|''e'']]</sub>(FPG) !! log<sub>e</sub>(FPG) deviation from<br> mean ''μ''<sub>log</sub> !! Squared deviation<br>from mean
|-
| Subject 1 || 5.5 || 1.70 || 0.029 || 0.000841
|-
| Subject 2 || 5.2 || 1.65 || 0.021 || 0.000441
|-
| Subject 3 || 5.2 || 1.65 || 0.021 || 0.000441
|-
| Subject 4 || 5.8 || 1.76 || 0.089 || 0.007921
|-
| Subject 5 || 5.6 || 1.72 || 0.049 || 0.002401
|-
| Subject 6 || 4.6 || 1.53 || 0.141 || 0.019881
|-
| Subject 7 || 5.6 || 1.72 || 0.049 || 0.002401
|-
| Subject 8 || 5.9 || 1.77 || 0.099 || 0.009801
|-
| Subject 9 || 4.7 || 1.55 || 0.121 || 0.014641
|-
| Subject 10 || 5.0 || 1.61 || 0.061 || 0.003721
|-
| Subject 11 || 5.7 || 1.74 || 0.069 || 0.004761
|-
| Subject 12 || 5.2 || 1.65 || 0.021 || 0.000441
|-
| || '''Mean: 5.33''' <br> (''m'') || '''Mean: 1.67'''<br> (''μ''<sub>log</sub>) ||  || Sum/(n-1) : 0.068/11 = 0.0062 <br> <math>
    \sqrt{0.0062} = 0.079</math><br>= '''standard deviation of log<sub>e</sub>(FPG)'''<br> (''σ''<sub>log</sub>)
|}

Subsequently, the still logarithmized lower limit of the reference range is calculated as:

: <math>\begin{align} \ln (\text{lower limit}) &= \mu_{\log} - t_{0.975,n-1} \times\sqrt{\frac{n+1}{n}} \times \sigma_{\log}\\ 
&= 1.67 - 2.20\times\sqrt{\frac{13}{12}} \times 0.079 = 1.49, \end{align}</math>

and the upper limit of the reference range as:

: <math>\begin{align} \ln (\text{upper limit}) &= \mu_{\log} + t_{0.975,n-1} \times\sqrt{\frac{n+1}{n}} \times \sigma_{\log}\\ 
&= 1.67 + 2.20\times\sqrt{\frac{13}{12}} \times 0.079 = 1.85 \end{align}</math>

Conversion back to non-logarithmized values are subsequently performed as:

: <math> \text{Lower limit} = e^{\ln (\text{lower limit})} = e^{1.49} = 4.4</math>

: <math> \text{Upper limit} = e^{\ln (\text{upper limit})} = e^{1.85} = 6.4</math>

Thus, the standard reference range for this example is estimated to be 4.4 to 6.4.

=====From arithmetic mean and variance=====
An alternative method of establishing a reference range with the assumption of log-normal distribution is to use the arithmetic mean and standard deviation. This is somewhat more tedious to perform, but may be useful in cases where a study presents only the arithmetic mean and standard deviation, while leaving out the source data. If the original assumption of normal distribution is less appropriate than the log-normal one, then, using the arithmetic mean and standard deviation may be the only available parameters to determine the reference range.

By assuming that the [[expected value]] can represent the arithmetic mean in this case, the parameters ''μ<sub>log</sub>'' and ''σ<sub>log</sub>'' can be estimated from the arithmetic mean (''m'') and standard deviation (''s.d.'') as:
: <math> \mu_{\log} = \ln(m) - \frac12 \ln\!\left(1 + \!\left(\frac\text{s.d.}{m}\right)^2 \right) </math>

: <math> \sigma_{\log} = \sqrt{\ln\!\left(1 + \!\left(\frac\text{s.d.}{m}\right)^2 \right)} </math>

Following the exampled reference group from the previous section:

: <math> \mu_{\log} = \ln(5.33) - \frac12 \ln\!\left(1 + \!\left(\frac{0.42}{5.33}\right)^2 \right) = 1.67</math>

: <math> \sigma_{\log} = \sqrt{\ln\!\left(1 + \!\left(\frac{0.42}{5.33}\right)^2 \right)} = 0.079 </math>

Subsequently, the logarithmized, and later non-logarithmized, lower and upper limit are calculated just as by logarithmized sample values.

====Directly from percentages of interest====
Reference ranges can also be established directly from the 2.5th and 97.5th percentile of the measurements in the reference group. For example, if the reference group consists of 200 people, and counting from the measurement with lowest value to highest, the lower limit of the reference range would correspond to the 5th measurement and the upper limit would correspond to the 195th measurement.

This method can be used even when measurement values do not appear to conform conveniently to any form of normal distribution or other function.

However, the reference range limits as estimated in this way have higher variance, and therefore less reliability, than those estimated by an arithmetic or log-normal distribution (when such is applicable), because the latter ones acquire [[statistical power]] from the measurements of the whole reference group rather than just the measurements at the 2.5th and 97.5th percentiles. Still, this variance decreases with increasing size of the reference group, and therefore, this method may be optimal where a large reference group easily can be gathered, and the distribution mode of the measurements is uncertain.

====Bimodal distribution====
[[Image:bimodal.png|thumb|300px|[[Bimodal distribution]]]]
In case of a [[bimodal distribution]] (seen at right), it is useful to find out why this is the case. Two reference ranges can be established for the two different groups of people, making it possible to assume a normal distribution for each group. This bimodal pattern is commonly seen in tests that differ between men and women, such as [[prostate specific antigen]].