Editing Logistic regression (section)

==Background==
[[Image:Logistic-curve.svg|thumb|320px|right|Figure 1. The standard logistic function <math>\sigma (t)</math>; <math>\sigma (t) \in (0,1)</math> for all <math>t</math>.]]

===Definition of the logistic function===
An explanation of logistic regression can begin with an explanation of the standard [[logistic function]]. The logistic function is a [[sigmoid function]], which takes any [[Real number|real]] input <math>t</math>, and outputs a value between zero and one.<ref name=Hosmer/> For the logit, this is interpreted as taking input [[log-odds]] and having output [[probability]]. The ''standard'' logistic function <math>\sigma:\mathbb R\rightarrow (0,1)</math> is defined as follows:

:<math>\sigma (t) = \frac{e^t}{e^t+1} = \frac{1}{1+e^{-t}}</math>

A graph of the logistic function on the ''t''-interval (−6,6) is shown in Figure 1.

Let us assume that <math>t</math> is a linear function of a single [[dependent and independent variables|explanatory variable]] <math>x</math> (the case where <math>t</math> is a ''linear combination'' of multiple explanatory variables is treated similarly). We can then express <math>t</math> as follows:

:<math>t = \beta_0 + \beta_1 x</math>

And the general logistic function <math>p:\mathbb R \rightarrow (0,1)</math> can now be written as:

:<math>p(x) = \sigma(t)= \frac {1}{1+e^{-(\beta_0 + \beta_1 x)}}</math>

In the logistic model, <math>p(x)</math> is interpreted as the probability of the dependent variable <math>Y</math> equaling a success/case rather than a failure/non-case. It is clear that the [[Dependent and independent variables|response variables]] <math>Y_i</math> are not identically distributed: <math>P(Y_i = 1\mid X)</math> differs from one data point <math>X_i</math> to another, though they are independent given [[design matrix]] <math>X</math> and shared parameters <math>\beta</math>.<ref name = "Freedman09" />

===Definition of the inverse of the logistic function===
We can now define the [[logit]] (log odds) function as the inverse <math>g = \sigma^{-1}</math> of the standard logistic function. It is easy to see that it satisfies:

:<math>g(p(x)) = \sigma^{-1} (p(x)) = \operatorname{logit} p(x) = \ln \left( \frac{p(x)}{1 - p(x)} \right) = \beta_0 + \beta_1 x ,</math>

and equivalently, after exponentiating both sides we have the odds:

:<math>\frac{p(x)}{1 - p(x)} = e^{\beta_0 + \beta_1 x}.</math>

===Interpretation of these terms===
In the above equations, the terms are as follows:

* <math>g</math> is the logit function. The equation for <math>g(p(x))</math> illustrates that the [[logit]] (i.e., log-odds or natural logarithm of the odds) is equivalent to the linear regression expression.
* <math>\ln</math> denotes the [[natural logarithm]].
* <math>p(x)</math> is the probability that the dependent variable equals a case, given some linear combination of the predictors. The formula for <math>p(x)</math> illustrates that the probability of the dependent variable equaling a case is equal to the value of the logistic function of the linear regression expression. This is important in that it shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation, the resulting expression for the probability <math>p(x)</math> ranges between 0 and 1.
* <math>\beta_0</math> is the [[Y-intercept|intercept]] from the linear regression equation (the value of the criterion when the predictor is equal to zero).
* <math>\beta_1 x</math> is the regression coefficient multiplied by some value of the predictor.
* base <math>e</math> denotes the exponential function.

===Definition of the odds===
The odds of the dependent variable equaling a case (given some linear combination <math>x</math> of the predictors) is equivalent to the exponential function of the linear regression expression. This illustrates how the [[logit]] serves as a link function between the probability and the linear regression expression. Given that the logit ranges between negative and positive infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted back into the odds.<ref name=Hosmer/>

So we define odds of the dependent variable equaling a case (given some linear combination <math>x</math> of the predictors) as follows:

:<math>\text{odds} = e^{\beta_0 + \beta_1 x}.</math>

===The odds ratio===
For a continuous independent variable the odds ratio can be defined as:

:[[File:Odds Ratio-1.jpg|thumb|The image represents an outline of what an odds ratio looks like in writing, through a template in addition to the test score example in the "Example" section of the contents. In simple terms, if we hypothetically get an odds ratio of 2 to 1, we can say... "For every one-unit increase in hours studied, the odds of passing (group 1) or failing (group 0) are (expectedly) 2 to 1 (Denis, 2019).]]<math> \mathrm{OR} = \frac{\operatorname{odds}(x+1)}{\operatorname{odds}(x)} = \frac{\left(\frac{p(x+1)}{1 - p(x+1)}\right)}{\left(\frac{p(x)}{1 - p(x)}\right)}
                                        = \frac{e^{\beta_0 + \beta_1 (x+1)}}{e^{\beta_0 + \beta_1 x}} = e^{\beta_1}</math>

This exponential relationship provides an interpretation for <math>\beta_1</math>: The odds multiply by <math>e^{\beta_1}</math> for every 1-unit increase in x.<ref>{{cite web|url=https://stats.idre.ucla.edu/stata/faq/how-do-i-interpret-odds-ratios-in-logistic-regression/|title=How to Interpret Odds Ratio in Logistic Regression?|publisher=Institute for Digital Research and Education}}</ref>

For a binary independent variable the odds ratio is defined as <math>\frac{ad}{bc}</math> where ''a'', ''b'', ''c'' and ''d'' are cells in a 2×2 [[contingency table]].<ref>{{cite book | last = Everitt | first = Brian | title = The Cambridge Dictionary of Statistics | publisher = Cambridge University Press | location = Cambridge, UK New York | year = 1998 | isbn = 978-0-521-59346-5 | url-access = registration | url = https://archive.org/details/cambridgediction00ever_0 }}</ref>

===Multiple explanatory variables===
If there are multiple explanatory variables, the above expression <math>\beta_0+\beta_1x</math> can be revised to <math>\beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_mx_m = \beta_0+ \sum_{i=1}^m \beta_ix_i</math>. Then when this is used in the equation relating the log odds of a success to the values of the predictors, the linear regression will be a [[multiple regression]] with ''m'' explanators; the parameters <math>\beta_i</math> for all <math>i = 0, 1, 2, \dots, m</math> are all estimated.

Again, the more traditional equations are:

:<math>\log \frac{p}{1-p} = \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_mx_m</math>

and

:<math>p = \frac{1}{1+b^{-(\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_mx_m )}}</math>

where usually <math>b=e</math>.