Editing Generative model (section)

=== Relationships between models ===
In application to classification, the observable ''X'' is frequently a [[continuous variable]], the target ''Y'' is generally a [[discrete variable]] consisting of a finite set of labels, and the conditional probability <math>P(Y\mid X)</math> can also be interpreted as a (non-deterministic) [[target function]] <math>f\colon X \to Y</math>, considering ''X'' as inputs and ''Y'' as outputs.

Given a finite set of labels, the two definitions of "generative model" are closely related. A model of the conditional distribution <math>P(X\mid Y = y)</math> is a model of the distribution of each label, and a model of the joint distribution is equivalent to a model of the distribution of label values <math>P(Y)</math>, together with the distribution of observations given a label, <math>P(X\mid Y)</math>; symbolically, <math>P(X, Y) = P(X\mid Y)P(Y).</math> Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.

Given a model of the joint distribution, <math>P(X, Y)</math>, the distribution of the individual variables can be computed as the [[marginal distribution]]s <math>P(X) = \sum_y P(X , Y = y)</math> and <math>P(Y) = \int_x P(Y, X = x)</math> (considering ''X'' as continuous, hence integrating over it, and ''Y'' as discrete, hence summing over it), and either conditional distribution can be computed from the definition of [[conditional probability]]: <math>P(X\mid Y)=P(X, Y)/P(Y)</math> and <math>P(Y\mid X)=P(X, Y)/P(X)</math>.

Given a model of one conditional probability, and estimated [[probability distribution]]s for the variables ''X'' and ''Y'', denoted <math>P(X)</math> and <math>P(Y)</math>, one can estimate the opposite conditional probability using [[Bayes' rule]]:
:<math>P(X\mid Y)P(Y) = P(Y\mid X)P(X).</math>
For example, given a generative model for <math>P(X\mid Y)</math>, one can estimate:
:<math>P(Y\mid X) = P(X\mid Y)P(Y)/P(X),</math>
and given a discriminative model for <math>P(Y\mid X)</math>, one can estimate:
:<math>P(X\mid Y) = P(Y\mid X)P(X)/P(Y).</math>
Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.