Editing Logistic function (section)

==Probabilistic interpretation==
{{further|Logistic regression}}

When the capacity <math>L = 1</math>, the value of the logistic function is in the range {{tmath|(0, 1)}} and can be interpreted as a probability {{mvar|p}}.{{efn|This can be extended to the [[Extended real number line]] by setting <math>f(-\infty) = 0</math> and <math>f(+\infty) = 1</math>, matching the limit values.}} In more detail, {{mvar|p}} can be interpreted as the probability of one of two alternatives (the parameter of a [[Bernoulli distribution]]);{{efn|In fact, the logistic function is the inverse mapping to the [[natural parameter]] of the Bernoulli distribution, namely the [[logit function]], and in this sense it is the "natural parametrization" of a binary probability.}} the two alternatives are complementary, so the probability of the other alternative is <math>q = 1 - p</math> and <math>p + q = 1</math>. The two alternatives are coded as 1 and 0, corresponding to the limiting values as <math>x \to \pm \infty</math>.

In this interpretation the input {{mvar|x}} is the [[log-odds]] for the first alternative (relative to the other alternative), measured in "logistic units" (or [[logit]]s), {{tmath|e^x}} is the [[odds]] for the first event (relative to the second), and, recalling that given odds of <math>O = O:1</math> for ({{tmath|O}} against {{math|1}}), the probability is the ratio of for over (for plus against), <math>O/(O+1)</math>, we see that <math>e^x/(e^x + 1) = 1/(1 + e^{-x}) = p</math> is the probability of the first alternative. Conversely, {{mvar|x}} is the log-odds ''against'' the second alternative, {{tmath|-x}} is the log-odds ''for'' the second alternative, <math>e^{-x}</math> is the odds for the second alternative, and <math>e^{-x}/(e^{-x} + 1) = 1/(1 + e^x) = q</math> is the probability of the second alternative.

This can be framed more symmetrically in terms of two inputs, {{tmath|x_0}} and {{tmath|x_1}}, which then generalizes naturally to more than two alternatives. Given two real number inputs, {{tmath|x_0}} and {{tmath|x_1}}, interpreted as logits, their ''difference'' <math>x_1 - x_0</math> is the log-odds for option 1 (the log-odds ''against'' option 0), <math>e^{x_1 - x_0}</math> is the odds,
<math>e^{x_1 - x_0}/(e^{x_1 - x_0} + 1) = 1/\left(1 + e^{-(x_1 - x_0)}\right) = e^{x_1}/(e^{x_0} + e^{x_1})</math> is the probability of option 1, and similarly <math>e^{x_0}/(e^{x_0} + e^{x_1})</math> is the probability of option 0.

This form immediately generalizes to more alternatives as the [[softmax function]], which is a vector-valued function whose {{mvar|i}}-th coordinate is <math display=inline>e^{x_i} / \sum_{i=0}^n e^{x_i}</math>.

More subtly, the symmetric form emphasizes interpreting the input {{mvar|x}} as <math>x_1 - x_0</math> and thus ''relative'' to some reference point, implicitly to <math>x_0 = 0</math>. Notably, the softmax function is invariant under adding a constant to all the logits <math>x_i</math>, which corresponds to the difference <math>x_j - x_i</math> being the log-odds for option {{mvar|j}} against option {{mvar|i}}, but the individual logits <math>x_i</math> not being log-odds on their own. Often one of the options is used as a reference ("pivot"), and its value fixed as {{math|0}}, so the other logits are interpreted as odds versus this reference. This is generally done with the first alternative, hence the choice of numbering: <math>x_0 = 0</math>, and then <math>x_i = x_i - x_0</math> is the log-odds for option {{mvar|i}} against option {{math|0}}. Since <math>e^0 = 1</math>, this yields the <math>+1</math> term in many expressions for the logistic function and generalizations.{{efn|For example, the [[softplus]] function (the integral of the logistic function) is a smooth version of <math>\max(0, x)</math>, while the relative form is a smooth form of <math>\max(x_0, x_1)</math>, specifically [[LogSumExp]]. Softplus thus generalizes as (note the 0 and the corresponding 1 for the reference class) <math>\operatorname{LSE_0}^+(x_1, \dots, x_n) := \operatorname{LSE}(0, x_1, \dots, x_n) = \ln(1 + e^{x_1} + \cdots + e^{x_n}).</math>}}