Editing Naive Bayes classifier (section)

===Document classification===
Here is a worked example of naive Bayesian classification to the [[document classification]] problem.
Consider the problem of classifying documents by their content, for example into [[spamming|spam]] and non-spam [[e-mail]]s. Imagine that documents are drawn from a number of classes of documents which can be modeled as sets of words where the (independent) probability that the i-th word of a given document occurs in a document from class ''C'' can be written as
<math display="block">p(w_i \mid C)\,</math>

(For this treatment, things are further simplified by assuming that words are randomly distributed in the document - that is, words are not dependent on the length of the document, position within the document with relation to other words, or other document-context.)

Then the probability that a given document ''D'' contains all of the words <math>w_i</math>, given a class ''C'', is
<math display="block">p(D\mid C) = \prod_i p(w_i \mid C)\,</math>

The question that has to be answered is: "what is the probability that a given document ''D'' belongs to a given class ''C''?" In other words, what is <math>p(C \mid D)\,</math>?

Now [[Conditional probability|by definition]]
<math display="block">p(D\mid C)={p(D\cap C)\over p(C)}</math>
and
<math display="block">p(C \mid D) = {p(D\cap C)\over p(D)}</math>

Bayes' theorem manipulates these into a statement of probability in terms of [[likelihood]].
<math display="block">p(C\mid D) = \frac{p(C)\,p(D\mid C)}{p(D)}</math>

Assume for the moment that there are only two mutually exclusive classes, ''S'' and ¬''S'' (e.g. spam and not spam), such that every element (email) is in either one or the other;
<math display="block">p(D\mid S)=\prod_i p(w_i \mid S)\,</math>
and
<math display="block">p(D\mid\neg S)=\prod_i p(w_i\mid\neg S)\,</math>

Using the Bayesian result above, one can write:
<math display="block">p(S\mid D)={p(S)\over p(D)}\,\prod_i p(w_i \mid S)</math>
<math display="block">p(\neg S\mid D)={p(\neg S)\over p(D)}\,\prod_i p(w_i \mid\neg S)</math>

Dividing one by the other gives:
<math display="block">{p(S\mid D)\over p(\neg S\mid D)}={p(S)\,\prod_i p(w_i \mid S)\over p(\neg S)\,\prod_i p(w_i \mid\neg S)}</math>

Which can be re-factored as:
<math display="block">{p(S\mid D)\over p(\neg S\mid D)}={p(S)\over p(\neg S)}\,\prod_i {p(w_i \mid S)\over p(w_i \mid\neg S)}</math>

Thus, the probability ratio p(''S'' | ''D'') / p(¬''S'' | ''D'') can be expressed in terms of a series of [[likelihood function|likelihood ratios]].
The actual probability p(''S'' | ''D'') can be easily computed from log (p(''S'' | ''D'') / p(¬''S'' | ''D'')) based on the observation that p(''S'' | ''D'') + p(¬''S'' | ''D'') = 1.

Taking the [[logarithm]] of all these ratios, one obtains:
<math display="block">\ln{p(S\mid D)\over p(\neg S\mid D)}=\ln{p(S)\over p(\neg S)}+\sum_i \ln{p(w_i\mid S)\over p(w_i\mid\neg S)}</math>

(This technique of "[[log-likelihood ratio]]s" is a common technique in statistics.
In the case of two mutually exclusive alternatives (such as this example), the conversion of a log-likelihood ratio to a probability takes the form of a [[sigmoid curve]]: see [[logit]] for details.)

Finally, the document can be classified as follows.  It is spam if <math>p(S\mid D) > p(\neg S\mid D)</math> (i. e., <math>\ln{p(S\mid D) \over p(\neg S\mid D)} > 0</math>), otherwise it is not spam.