Editing Logistic regression (section)

===Two-way latent-variable model===
Yet another formulation uses two separate latent variables:

: <math>
\begin{align}
Y_i^{0\ast} &= \boldsymbol\beta_0 \cdot \mathbf{X}_i + \varepsilon_0 \, \\
Y_i^{1\ast} &= \boldsymbol\beta_1 \cdot \mathbf{X}_i + \varepsilon_1 \,
\end{align}
</math>

where

: <math>
\begin{align}
\varepsilon_0 & \sim \operatorname{EV}_1(0,1) \\
\varepsilon_1 & \sim \operatorname{EV}_1(0,1)
\end{align}
</math>

where ''EV''<sub>1</sub>(0,1) is a standard type-1 [[extreme value distribution]]: i.e.

:<math>\Pr(\varepsilon_0=x) = \Pr(\varepsilon_1=x) = e^{-x} e^{-e^{-x}}</math>

Then

: <math> Y_i = \begin{cases} 1 & \text{if }Y_i^{1\ast} > Y_i^{0\ast}, \\
0 &\text{otherwise.} \end{cases} </math>

This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable.  The reason for this separation is that it makes it easy to extend logistic regression to multi-outcome categorical variables, as in the [[multinomial logit]] model. In such a model, it is natural to model each possible outcome using a different set of regression coefficients.  It is also possible to motivate each of the separate latent variables as the theoretical [[utility]] associated with making the associated choice, and thus motivate logistic regression in terms of [[utility theory]]. (In terms of utility theory, a rational actor always chooses the choice with the greatest associated utility.) This is the approach taken by economists when formulating [[discrete choice]] models, because it both provides a theoretically strong foundation and facilitates intuitions about the model, which in turn makes it easy to consider various sorts of extensions. (See the example below.)

The choice of the type-1 [[extreme value distribution]] seems fairly arbitrary, but it makes the mathematics work out, and it may be possible to justify its use through [[rational choice theory]].

It turns out that this model is equivalent to the previous model, although this seems non-obvious, since there are now two sets of regression coefficients and error variables, and the error variables have a different distribution.  In fact, this model reduces directly to the previous one with the following substitutions:
:<math>\boldsymbol\beta = \boldsymbol\beta_1 - \boldsymbol\beta_0</math>
:<math>\varepsilon = \varepsilon_1 - \varepsilon_0</math>
An intuition for this comes from the fact that, since we choose based on the maximum of two values, only their difference matters, not the exact values — and this effectively removes one [[Degrees of freedom (statistics)|degree of freedom]]. Another critical fact is that the difference of two type-1 extreme-value-distributed variables is a logistic distribution, i.e. <math>\varepsilon = \varepsilon_1 - \varepsilon_0 \sim \operatorname{Logistic}(0,1) .</math> We can demonstrate the equivalent as follows:

:<math>\begin{align}
\Pr(Y_i=1\mid\mathbf{X}_i) = {} & \Pr \left (Y_i^{1\ast} > Y_i^{0\ast}\mid\mathbf{X}_i \right ) & \\[5pt]
= {} & \Pr \left (Y_i^{1\ast} - Y_i^{0\ast} > 0\mid\mathbf{X}_i \right ) & \\[5pt]
= {} & \Pr \left (\boldsymbol\beta_1 \cdot \mathbf{X}_i + \varepsilon_1 - \left (\boldsymbol\beta_0 \cdot \mathbf{X}_i + \varepsilon_0 \right ) > 0 \right ) & \\[5pt]
= {} & \Pr \left ((\boldsymbol\beta_1 \cdot \mathbf{X}_i - \boldsymbol\beta_0 \cdot \mathbf{X}_i) + (\varepsilon_1 - \varepsilon_0) > 0 \right ) & \\[5pt]
= {} & \Pr((\boldsymbol\beta_1 - \boldsymbol\beta_0) \cdot \mathbf{X}_i + (\varepsilon_1 - \varepsilon_0) > 0) & \\[5pt]
= {} & \Pr((\boldsymbol\beta_1 - \boldsymbol\beta_0) \cdot \mathbf{X}_i + \varepsilon > 0) & & \text{(substitute } \varepsilon\text{ as above)} \\[5pt]
= {} & \Pr(\boldsymbol\beta \cdot \mathbf{X}_i + \varepsilon > 0) & & \text{(substitute }\boldsymbol\beta\text{ as above)} \\[5pt]
= {} & \Pr(\varepsilon > -\boldsymbol\beta \cdot \mathbf{X}_i) & & \text{(now, same as above model)}\\[5pt]
= {} & \Pr(\varepsilon < \boldsymbol\beta \cdot \mathbf{X}_i) & \\[5pt]
= {} & \operatorname{logit}^{-1}(\boldsymbol\beta \cdot \mathbf{X}_i) \\[5pt]
= {} & p_i
\end{align}</math>

====Example====
: {{Original research|example|discuss=Talk:Logistic_regression#Utility_theory_/_Elections_example_is_irrelevant|date=May 2022}} 
As an example, consider a province-level election where the choice is between a right-of-center party, a left-of-center party, and a secessionist party (e.g. the [[Parti Québécois]], which wants [[Quebec]] to secede from [[Canada]]).  We would then use three latent variables, one for each choice.  Then, in accordance with [[utility theory]], we can then interpret the latent variables as expressing the [[utility]] that results from making each of the choices.  We can also interpret the regression coefficients as indicating the strength that the associated factor (i.e. explanatory variable) has in contributing to the utility — or more correctly, the amount by which a unit change in an explanatory variable changes the utility of a given choice.  A voter might expect that the right-of-center party would lower taxes, especially on rich people.  This would give low-income people no benefit, i.e. no change in utility (since they usually don't pay taxes); would cause moderate benefit (i.e. somewhat more money, or moderate utility increase) for middle-incoming people; would cause significant benefits for high-income people.  On the other hand, the left-of-center party might be expected to raise taxes and offset it with increased welfare and other assistance for the lower and middle classes.  This would cause significant positive benefit to low-income people, perhaps a weak benefit to middle-income people, and significant negative benefit to high-income people.  Finally, the secessionist party would take no direct actions on the economy, but simply secede. A low-income or middle-income voter might expect basically no clear utility gain or loss from this, but a high-income voter might expect negative utility since he/she is likely to own companies, which will have a harder time doing business in such an environment and probably lose money.

These intuitions can be expressed as follows:
{{table alignment}}
{|class="wikitable col2right col3left"
|+Estimated strength of regression coefficient for different outcomes (party choices) and different values of explanatory variables
|-
! !! Center-right !! Center-left !! Secessionist
|-
! High-income
| strong + || strong − || strong −
|-
! Middle-income
| moderate + || weak + || {{CNone|none}}
|-
! Low-income
| {{CNone|none|style=text-align:right;}} || strong + || {{CNone|none}}
|-
|}

This clearly shows that
# Separate sets of regression coefficients need to exist for each choice.  When phrased in terms of utility, this can be seen very easily. Different choices have different effects on net utility; furthermore, the effects vary in complex ways that depend on the characteristics of each individual, so there need to be separate sets of coefficients for each characteristic, not simply a single extra per-choice characteristic.
# Even though income is a continuous variable, its effect on utility is too complex for it to be treated as a single variable.  Either it needs to be directly split up into ranges, or higher powers of income need to be added so that [[polynomial regression]] on income is effectively done.