Editing Logistic regression (section)

=== Multinomial logistic regression: Many explanatory variables and many categories ===
{{main|Multinomial logistic regression}}

In the above cases of two categories (binomial logistic regression), the categories were indexed by "0" and "1", and we had two probabilities: The probability that the outcome was in category 1 was given by <math>p(\boldsymbol{x})</math>and the probability that the outcome was in category 0 was given by <math>1-p(\boldsymbol{x})</math>. The sum of these probabilities equals 1, which must be true, since "0" and "1" are the only possible categories in this setup.

In general, if we have {{tmath|M+1}} explanatory variables (including ''x<sub>0</sub>'') and {{tmath|N+1}} categories, we will need {{tmath|N+1}} separate probabilities,  one for each category, indexed by ''n'', which describe the probability that the categorical outcome ''y'' will be in category ''y=n'', conditional on the vector of covariates '''x'''. The sum of these probabilities over all categories must equal 1. Using the mathematically convenient base ''e'', these probabilities are:

:<math>p_n(\boldsymbol{x}) = \frac{e^{\boldsymbol{\beta}_n\cdot \boldsymbol{x}}}{1+\sum_{u=1}^N e^{\boldsymbol{\beta}_u\cdot \boldsymbol{x}}}</math> for <math>n=1,2,\dots,N</math>
:<math>p_0(\boldsymbol{x}) = 1-\sum_{n=1}^N p_n(\boldsymbol{x})=\frac{1}{1+\sum_{u=1}^N e^{\boldsymbol{\beta}_u\cdot \boldsymbol{x}}}</math>

Each of the probabilities except <math>p_0(\boldsymbol{x})</math> will have their own set of regression coefficients <math>\boldsymbol{\beta}_n</math>.  It can be seen that, as required, the sum of the <math>p_n(\boldsymbol{x})</math> over all categories ''n'' is 1. The selection of <math>p_0(\boldsymbol{x})</math> to be defined in terms of the other probabilities is artificial. Any of the probabilities could have been selected to be so defined. This special value of ''n'' is termed the "pivot index", and the log-odds (''t<sub>n</sub>'') are expressed in terms of the pivot probability and are again expressed as a linear combination of the explanatory variables:

:<math>t_n = \ln\left(\frac{p_n(\boldsymbol{x})}{p_0(\boldsymbol{x})}\right) = \boldsymbol{\beta}_n \cdot \boldsymbol{x}</math>

Note also that for the simple case of <math>N=1</math>, the two-category case is recovered, with <math>p(\boldsymbol{x})=p_1(\boldsymbol{x})</math> and <math>p_0(\boldsymbol{x})=1-p_1(\boldsymbol{x})</math>.

The log-likelihood that a particular set of ''K'' measurements or data points will be generated by the above probabilities can now be calculated. Indexing each measurement by ''k'', let the ''k''-th set of measured explanatory variables be denoted by <math>\boldsymbol{x}_k</math> and their categorical outcomes be denoted by <math>y_k</math> which can be equal to any integer in [0,N]. The log-likelihood is then:

:<math>\ell = \sum_{k=1}^K \sum_{n=0}^N \Delta(n,y_k)\,\ln(p_n(\boldsymbol{x}_k))</math>

where <math>\Delta(n,y_k)</math> is an [[indicator function]] which equals 1 if ''y<sub>k</sub> = n'' and zero otherwise. In the case of two explanatory variables, this indicator function was defined as ''y<sub>k</sub>'' when ''n'' = 1 and ''1-y<sub>k</sub>'' when ''n'' = 0. This was convenient, but not necessary.<ref>For example, the indicator function in this case could be defined as <math>\Delta(n,y)=1-(y-n)^2</math></ref> Again, the optimum beta coefficients may be found by maximizing the log-likelihood function generally using numerical methods. A possible method of solution is to set the derivatives of the log-likelihood with respect to each beta coefficient equal to zero and solve for the beta coefficients:

:<math>\frac{\partial \ell}{\partial  \beta_{nm}} = 0 = \sum_{k=1}^K \Delta(n,y_k)x_{mk} - \sum_{k=1}^K p_n(\boldsymbol{x}_k)x_{mk}</math>

where <math>\beta_{nm}</math> is the ''m''-th coefficient of the <math>\boldsymbol{\beta}_n</math> vector and <math>x_{mk}</math> is the ''m''-th explanatory variable of the ''k''-th measurement. Once the beta coefficients have been estimated from the data, we will be able to estimate the probability that any subsequent set of explanatory variables will result in any of the possible outcome categories.