Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Logistic regression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Fit=== The usual measure of [[goodness of fit]] for a logistic regression uses [[logistic loss]] (or [[log loss]]), the negative [[log-likelihood]]. For a given ''x<sub>k</sub>'' and ''y<sub>k</sub>'', write <math>p_k=p(x_k)</math>. The {{tmath|p_k}} are the probabilities that the corresponding {{tmath|y_k}} will equal one and {{tmath|1-p_k}} are the probabilities that they will be zero (see [[Bernoulli distribution]]). We wish to find the values of {{tmath|\beta_0}} and {{tmath|\beta_1}} which give the "best fit" to the data. In the case of linear regression, the sum of the squared deviations of the fit from the data points (''y<sub>k</sub>''), the [[squared error loss]], is taken as a measure of the goodness of fit, and the best fit is obtained when that function is ''minimized''. The log loss for the ''k''-th point {{tmath|\ell_k}} is: :<math>\ell_k = \begin{cases} -\ln p_k & \text{ if } y_k = 1, \\ -\ln (1 - p_k) & \text{ if } y_k = 0. \end{cases}</math> The log loss can be interpreted as the "[[surprisal]]" of the actual outcome {{tmath|y_k}} relative to the prediction {{tmath|p_k}}, and is a measure of [[information content]]. Log loss is always greater than or equal to 0, equals 0 only in case of a perfect prediction (i.e., when <math>p_k = 1</math> and <math>y_k = 1</math>, or <math>p_k = 0</math> and <math>y_k = 0</math>), and approaches infinity as the prediction gets worse (i.e., when <math>y_k = 1</math> and <math>p_k \to 0</math> or <math>y_k = 0 </math> and <math>p_k \to 1</math>), meaning the actual outcome is "more surprising". Since the value of the logistic function is always strictly between zero and one, the log loss is always greater than zero and less than infinity. Unlike in a linear regression, where the model can have zero loss at a point by passing through a data point (and zero loss overall if all points are on a line), in a logistic regression it is not possible to have zero loss at any points, since {{tmath|y_k}} is either 0 or 1, but {{tmath|0 < p_k < 1}}. These can be combined into a single expression: :<math>\ell_k = -y_k\ln p_k - (1 - y_k)\ln (1 - p_k).</math> This expression is more formally known as the [[cross-entropy]] of the predicted distribution <math>\big(p_k, (1-p_k)\big)</math> from the actual distribution <math>\big(y_k, (1-y_k)\big)</math>, as probability distributions on the two-element space of (pass, fail). The sum of these, the total loss, is the overall negative log-likelihood {{tmath|-\ell}}, and the best fit is obtained for those choices of {{tmath|\beta_0}} and {{tmath|\beta_1}} for which {{tmath|-\ell}} is ''minimized''. Alternatively, instead of ''minimizing'' the loss, one can ''maximize'' its inverse, the (positive) log-likelihood: :<math>\ell = \sum_{k:y_k=1}\ln(p_k) + \sum_{k:y_k=0}\ln(1-p_k) = \sum_{k=1}^K \left(\,y_k \ln(p_k)+(1-y_k)\ln(1-p_k)\right)</math> or equivalently maximize the [[likelihood function]] itself, which is the probability that the given data set is produced by a particular logistic function: :<math>L = \prod_{k:y_k=1}p_k\,\prod_{k:y_k=0}(1-p_k)</math> This method is known as [[maximum likelihood estimation]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)