Editing Logistic regression (section)

===Maximum likelihood estimation (MLE)===

The regression coefficients are usually estimated using [[maximum likelihood estimation]].<ref name=Menard/><ref>{{cite journal |first1=Christian |last1=Gourieroux |first2=Alain |last2=Monfort |title=Asymptotic Properties of the Maximum Likelihood Estimator in Dichotomous Logit Models |journal=Journal of Econometrics |volume=17 |issue=1 |year=1981 |pages=83–97 |doi=10.1016/0304-4076(81)90060-9 }}</ref> Unlike linear regression with normally distributed residuals, it is not possible to find a closed-form expression for the coefficient values that maximize the likelihood function so an iterative process must be used instead; for example [[Newton's method]]. This process begins with a tentative solution, revises it slightly to see if it can be improved, and repeats this revision until no more improvement is made, at which point the process is said to have converged.<ref name="Menard" />

In some instances, the model may not reach convergence. Non-convergence of a model indicates that the coefficients are not meaningful because the iterative process was unable to find appropriate solutions. A failure to converge may occur for a number of reasons: having a large ratio of predictors to cases, [[multicollinearity]], [[sparse matrix|sparseness]], or complete [[Separation (statistics)|separation]].
* Having a large ratio of variables to cases results in an overly conservative Wald statistic (discussed below) and can lead to non-convergence. [[Regularization (mathematics)|Regularized]] logistic regression is specifically intended to be used in this situation.
* Multicollinearity refers to unacceptably high correlations between predictors. As multicollinearity increases, coefficients remain unbiased but standard errors increase and the likelihood of model convergence decreases.<ref name=Menard/> To detect multicollinearity amongst the predictors, one can conduct a linear regression analysis with the predictors of interest for the sole purpose of examining the tolerance statistic <ref name=Menard/>  used to assess whether multicollinearity is unacceptably high.
* Sparseness in the data refers to having a large proportion of empty cells (cells with zero counts). Zero cell counts are particularly problematic with categorical predictors. With continuous predictors, the model can infer values for the zero cell counts, but this is not the case with categorical predictors. The model will not converge with zero cell counts for categorical predictors because the natural logarithm of zero is an undefined value so that the final solution to the model cannot be reached. To remedy this problem, researchers may collapse categories in a theoretically meaningful way or add a constant to all cells.<ref name=Menard/>
* Another numerical problem that may lead to a lack of convergence is complete separation, which refers to the instance in which the predictors perfectly predict the criterion&nbsp;– all cases are accurately classified and the likelihood maximized with infinite coefficients. In such instances, one should re-examine the data, as there may be some kind of error.<ref name=Hosmer/>{{explain|date=May 2017|reason= Why is there likely some kind of error? How can this be remedied?}}
* One can also take semi-parametric or non-parametric approaches, e.g., via local-likelihood or nonparametric quasi-likelihood methods, which avoid assumptions of a parametric form for the index function and is robust to the choice of the link function (e.g., probit or logit).<ref name="sciencedirect.com">{{cite journal| doi=10.1016/j.csda.2016.10.024 | volume=108 | title=Nonparametric estimation of dynamic discrete choice models for time series data | year=2017 | journal=Computational Statistics & Data Analysis | pages=97–120 | last1 = Park | first1 = Byeong U. | last2 = Simar | first2 = Léopold | last3 = Zelenyuk | first3 = Valentin| url=https://espace.library.uq.edu.au/view/UQ:415620/UQ415620_OA.pdf }}</ref>