Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Logistic regression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===As a generalized linear model=== The particular model used by logistic regression, which distinguishes it from standard [[linear regression]] and from other types of [[regression analysis]] used for [[binary-valued]] outcomes, is the way the probability of a particular outcome is linked to the linear predictor function: :<math>\operatorname{logit}(\operatorname{\mathbb E}[Y_i\mid x_{1,i},\ldots,x_{m,i}]) = \operatorname{logit}(p_i) = \ln \left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_{1,i} + \cdots + \beta_m x_{m,i}</math> Written using the more compact notation described above, this is: :<math>\operatorname{logit}(\operatorname{\mathbb E}[Y_i\mid \mathbf{X}_i]) = \operatorname{logit}(p_i)=\ln\left(\frac{p_i}{1-p_i}\right) = \boldsymbol\beta \cdot \mathbf{X}_i</math> This formulation expresses logistic regression as a type of [[generalized linear model]], which predicts variables with various types of [[probability distribution]]s by fitting a linear predictor function of the above form to some sort of arbitrary transformation of the expected value of the variable. The intuition for transforming using the logit function (the natural log of the odds) was explained above{{Clarify|reason=What exactly was explained?|date=February 2023}}. It also has the practical effect of converting the probability (which is bounded to be between 0 and 1) to a variable that ranges over <math>(-\infty,+\infty)</math> β thereby matching the potential range of the linear prediction function on the right side of the equation. Both the probabilities ''p''<sub>''i''</sub> and the regression coefficients are unobserved, and the means of determining them is not part of the model itself. They are typically determined by some sort of optimization procedure, e.g. [[maximum likelihood estimation]], that finds values that best fit the observed data (i.e. that give the most accurate predictions for the data already observed), usually subject to [[regularization (mathematics)|regularization]] conditions that seek to exclude unlikely values, e.g. extremely large values for any of the regression coefficients. The use of a regularization condition is equivalent to doing [[maximum a posteriori]] (MAP) estimation, an extension of maximum likelihood. (Regularization is most commonly done using [[Ridge regression|a squared regularizing function]], which is equivalent to placing a zero-mean [[Gaussian distribution|Gaussian]] [[prior distribution]] on the coefficients, but other regularizers are also possible.) Whether or not regularization is used, it is usually not possible to find a closed-form solution; instead, an iterative numerical method must be used, such as [[iteratively reweighted least squares]] (IRLS) or, more commonly these days, a [[quasi-Newton method]] such as the [[L-BFGS|L-BFGS method]].<ref>{{cite conference |url=https://dl.acm.org/citation.cfm?id=1118871 |title=A comparison of algorithms for maximum entropy parameter estimation |last1=Malouf |first1=Robert |date= 2002|book-title= Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002) |pages= 49β55 |doi=10.3115/1118853.1118871 |doi-access=free }}</ref> The interpretation of the ''Ξ²''<sub>''j''</sub> parameter estimates is as the additive effect on the log of the [[odds]] for a unit change in the ''j'' the explanatory variable. In the case of a dichotomous explanatory variable, for instance, gender <math>e^\beta</math> is the estimate of the odds of having the outcome for, say, males compared with females. An equivalent formula uses the inverse of the logit function, which is the [[logistic function]], i.e.: :<math>\operatorname{\mathbb E}[Y_i\mid \mathbf{X}_i] = p_i = \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) = \frac{1}{1+e^{-\boldsymbol\beta \cdot \mathbf{X}_i}}</math> The formula can also be written as a [[probability distribution]] (specifically, using a [[probability mass function]]): : <math>\Pr(Y_i=y\mid \mathbf{X}_i) = {p_i}^y(1-p_i)^{1-y} =\left(\frac{e^{\boldsymbol\beta \cdot \mathbf{X}_i}}{1+e^{\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{y} \left(1-\frac{e^{\boldsymbol\beta \cdot \mathbf{X}_i}}{1+e^{\boldsymbol\beta \cdot \mathbf{X}_i}}\right)^{1-y} = \frac{e^{\boldsymbol\beta \cdot \mathbf{X}_i \cdot y} }{1+e^{\boldsymbol\beta \cdot \mathbf{X}_i}}</math>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)