Editing Generative model (section)

== Definition ==
An alternative division defines these symmetrically as:

* a ''generative'' model is a model of the conditional probability of the observable ''X'', given a target ''y'', symbolically, <math>P(X\mid Y = y)</math><ref name="mitchell2015generative">{{harvnb|Mitchell|2015}}: "We can use Bayes rule as the basis for designing learning algorithms (function approximators), as follows: Given that we wish to learn some target function <math>f\colon X \to Y</math>, or equivalently, <math>P(Y\mid X)</math>, we use the training data to learn estimates of <math>P(X\mid Y)</math> and <math>P(Y)</math>. New ''X'' examples can then be classified using these estimated probability distributions, plus Bayes rule. This type of classifier is called a ''generative'' classifier, because we can view the distribution <math>P(X\mid Y)</math> as describing how to generate random instances ''X'' conditioned on the target attribute ''Y''.</ref>
* a ''discriminative'' model is a model of the conditional probability of the target ''Y'', given an observation ''x'', symbolically, <math>P(Y\mid X = x)</math><ref name="mitchell2015discriminative">{{harvnb|Mitchell|2015}}: "Logistic Regression is a function approximation algorithm that uses training data to directly estimate <math>P(Y\mid X)</math>, in contrast to Naive Bayes. In this sense, Logistic Regression is often referred to as a ''discriminative'' classifier because we can view the distribution <math>P(Y\mid X)</math> as directly discriminating the value of the target value ''Y'' for any given instance ''X''</ref>

Regardless of precise definition, the terminology is constitutional because a generative model can be used to "generate" random instances ([[outcome (probability)|outcomes]]), either of an observation and target <math>(x, y)</math>, or of an observation ''x'' given a target value ''y'',<ref name="mitchell2015generative"/> while a discriminative model or discriminative classifier (without a model) can be used to "discriminate" the value of the target variable ''Y'', given an observation ''x''.<ref name="mitchell2015discriminative"/> The difference between "[[wikt:discriminate|discriminate]]" (distinguish) and "[[wikt:classify|classify]]" is subtle, and these are not consistently distinguished. (The term "discriminative classifier" becomes a [[pleonasm]] when "discrimination" is equivalent to "classification".)

The term "generative model" is also used to describe models that generate instances of output variables in a way that has no clear relationship to probability distributions over potential samples of input variables. [[Generative adversarial networks]] are examples of this class of generative models, and are judged primarily by the similarity of particular outputs to potential inputs. Such models are not classifiers.

=== Relationships between models ===
In application to classification, the observable ''X'' is frequently a [[continuous variable]], the target ''Y'' is generally a [[discrete variable]] consisting of a finite set of labels, and the conditional probability <math>P(Y\mid X)</math> can also be interpreted as a (non-deterministic) [[target function]] <math>f\colon X \to Y</math>, considering ''X'' as inputs and ''Y'' as outputs.

Given a finite set of labels, the two definitions of "generative model" are closely related. A model of the conditional distribution <math>P(X\mid Y = y)</math> is a model of the distribution of each label, and a model of the joint distribution is equivalent to a model of the distribution of label values <math>P(Y)</math>, together with the distribution of observations given a label, <math>P(X\mid Y)</math>; symbolically, <math>P(X, Y) = P(X\mid Y)P(Y).</math> Thus, while a model of the joint probability distribution is more informative than a model of the distribution of label (but without their relative frequencies), it is a relatively small step, hence these are not always distinguished.

Given a model of the joint distribution, <math>P(X, Y)</math>, the distribution of the individual variables can be computed as the [[marginal distribution]]s <math>P(X) = \sum_y P(X , Y = y)</math> and <math>P(Y) = \int_x P(Y, X = x)</math> (considering ''X'' as continuous, hence integrating over it, and ''Y'' as discrete, hence summing over it), and either conditional distribution can be computed from the definition of [[conditional probability]]: <math>P(X\mid Y)=P(X, Y)/P(Y)</math> and <math>P(Y\mid X)=P(X, Y)/P(X)</math>.

Given a model of one conditional probability, and estimated [[probability distribution]]s for the variables ''X'' and ''Y'', denoted <math>P(X)</math> and <math>P(Y)</math>, one can estimate the opposite conditional probability using [[Bayes' rule]]:
:<math>P(X\mid Y)P(Y) = P(Y\mid X)P(X).</math>
For example, given a generative model for <math>P(X\mid Y)</math>, one can estimate:
:<math>P(Y\mid X) = P(X\mid Y)P(Y)/P(X),</math>
and given a discriminative model for <math>P(Y\mid X)</math>, one can estimate:
:<math>P(X\mid Y) = P(Y\mid X)P(X)/P(Y).</math>
Note that Bayes' rule (computing one conditional probability in terms of the other) and the definition of conditional probability (computing conditional probability in terms of the joint distribution) are frequently conflated as well.