Editing Naive Bayes classifier (section)

====Other heuristics====
"Neutral" words like "the", "a", "some", or "is" (in English), or their equivalents in other languages, can be ignored. These are also known as [[Stop words]]. More generally, some bayesian filtering filters simply ignore all the words which have a spamicity next to 0.5, as they contribute little to a good decision. The words taken into consideration are those whose spamicity is next to 0.0 (distinctive signs of legitimate messages), or next to 1.0 (distinctive signs of spam). A method can be for example to keep only those ten words, in the examined message, which have the greatest [[absolute value]]&nbsp;|0.5&nbsp;−&nbsp;''pI''|.

Some software products take into account the fact that a given word appears several times in the examined message,<ref>{{cite web|url=http://spamprobe.sourceforge.net/paper.html|author=Brian Burton|title=SpamProbe - Bayesian Spam Filtering Tweaks|year=2003|access-date=2009-01-19|archive-url=https://web.archive.org/web/20120301235828/http://spamprobe.sourceforge.net/paper.html|archive-date=2012-03-01|url-status=live}}</ref> others don't.

Some software products use ''patterns'' (sequences of words) instead of isolated natural languages words.<ref>{{cite web|url=http://bnr.nuclearelephant.com/l|author=Jonathan A. Zdziarski|title=Bayesian Noise Reduction: Contextual Symmetry Logic Utilizing Pattern Consistency Analysis|year=2004}}{{dead link|date=February 2018 |bot=InternetArchiveBot |fix-attempted=yes }}</ref> For example, with a "context window" of four words, they compute the spamicity of "Viagra is good for", instead of computing the spamicities of "Viagra", "is", "good", and "for". This method gives more sensitivity to context and eliminates the Bayesian noise better, at the expense of a bigger database.