Editing Naive Bayes classifier (section)

====Disadvantages====
Depending on the implementation, Bayesian spam filtering may be susceptible to [[Bayesian poisoning]], a technique used by spammers in an attempt to degrade the effectiveness of spam filters that rely on Bayesian filtering. A spammer practicing Bayesian poisoning will send out emails with large amounts of legitimate text (gathered from legitimate news or literary sources). [[e-mail spam|Spammer]] tactics include insertion of random innocuous words that are not normally associated with spam, thereby decreasing the email's spam score, making it more likely to slip past a Bayesian spam filter. However, with (for example) [[Paul_Graham_(programmer)|Paul Graham]]'s scheme only the most significant probabilities are used, so that padding the text out with non-spam-related words does not affect the detection probability significantly.

Words that normally appear in large quantities in spam may also be transformed by spammers. For example, «Viagra» would be replaced with «Viaagra» or «V!agra» in the spam message. The recipient of the message can still read the changed words, but each of these words is met more rarely by the Bayesian filter, which hinders its learning process. As a general rule, this spamming technique does not work very well, because the derived words end up recognized by the filter just like the normal ones.<ref>Paul Graham (2002), [http://www.paulgraham.com/spam.html A Plan for Spam] {{Webarchive|url=https://web.archive.org/web/20040404013856/http://www.paulgraham.com/spam.html |date=2004-04-04 }}</ref>

Another technique used to try to defeat Bayesian spam filters is to replace text with pictures, either directly included or linked. The whole text of the message, or some part of it, is replaced with a picture where the same text is "drawn". The spam filter is usually unable to analyze this picture, which would contain the sensitive words like «Viagra». However, since many mail clients disable the display of linked pictures for security reasons, the spammer sending links to distant pictures might reach fewer targets. Also, a picture's size in bytes is bigger than the equivalent text's size, so the spammer needs more bandwidth to send messages directly including pictures. Some filters are more inclined to decide that a message is spam if it has mostly graphical contents. A solution used by [[Google]] in its [[Gmail]] email system is to perform an [[Optical character recognition|OCR (Optical Character Recognition)]] on every mid to large size image, analyzing the text inside.<ref>{{cite web|url=http://www.google.com/mail/help/intl/en_GB/fightspam/spamexplained.html|title=Gmail uses Google's innovative technology to keep spam out of your inbox|access-date=2015-09-05|archive-url=https://web.archive.org/web/20150913070222/http://www.google.com/mail/help/intl/en_GB/fightspam/spamexplained.html|archive-date=2015-09-13|url-status=live}}</ref><ref>{{cite journal|last1=Zhu|first1=Z.|last2=Jia|first2=Z|last3=Xiao|first3=H|last4=Zhang|first4=G|last5=Liang|first5=H.|last6=Wang|first6=P.|editor1-last=Li|editor1-first=S|editor2-last=Jin|editor2-first=Q|editor3-last=Jiang|editor3-first=X|editor4-last=Park|editor4-first=J|editor1-link=Frontier and Future Development of Information Technology in Medicine and Education. Lecture Notes in Electrical Engineering|title=A Modified Minimum Risk Bayes and {{as written|I|t's [sic]}} Application in Spam|journal=Lecture Notes in Electrical Engineering|date=2014|volume=269|pages=2155–2159|doi=10.1007/978-94-007-7618-0_261|publisher=Springer|location=Dordrecht|language=en}}</ref>