Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Naive Bayes classifier
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Multinomial naive Bayes === With a multinomial event model, samples (feature vectors) represent the frequencies with which certain events have been generated by a [[Multinomial distribution|multinomial]] <math>(p_1, \dots, p_n)</math> where <math>p_i</math> is the probability that event {{mvar|i}} occurs (or {{mvar|K}} such multinomials in the multiclass case). A feature vector <math>\mathbf{x} = (x_1, \dots, x_n)</math> is then a [[histogram]], with <math>x_i</math> counting the number of times event {{mvar|i}} was observed in a particular instance. This is the event model typically used for document classification, with events representing the occurrence of a word in a single document (see [[bag of words]] assumption).<ref>{{cite book |last1=James |first1=Gareth |last2=Witten |first2=Daniela |last3=Hastie |first3=Trevor |last4=Tibshirani |first4=Robert |title=An introduction to statistical learning: with applications in R |date=2021 |publisher=Springer |location=New York, NY |isbn=978-1-0716-1418-1 |page=157 |edition=Second |doi=10.1007/978-1-0716-1418-1 |url=https://link.springer.com/book/10.1007/978-1-0716-1418-1 |access-date=10 November 2024}}</ref> The likelihood of observing a histogram {{math|'''x'''}} is given by: <math display="block"> p(\mathbf{x} \mid C_k) = \frac{(\sum_{i=1}^n x_i)!}{\prod_{i=1}^n x_i !} \prod_{i=1}^n {p_{ki}}^{x_i} </math> where <math>p_{ki} := p(i \mid C_k)</math>. The multinomial naive Bayes classifier becomes a [[linear classifier]] when expressed in log-space:<ref name="rennie">{{cite conference |last1=Rennie |first1=J. |last2=Shih |first2=L. |last3=Teevan |first3=J. |last4=Karger |first4=D. |title=Tackling the poor assumptions of naive Bayes classifiers |conference=ICML |year=2003 |url=http://people.csail.mit.edu/~jrennie/papers/icml03-nb.pdf |archive-url=https://ghostarchive.org/archive/20221009/http://people.csail.mit.edu/~jrennie/papers/icml03-nb.pdf |archive-date=2022-10-09 |url-status=live}}</ref> <math display="block"> \begin{align} \log p(C_k \mid \mathbf{x}) & \varpropto \log \left( p(C_k) \prod_{i=1}^n {p_{ki}}^{x_i} \right) \\ & = \log p(C_k) + \sum_{i=1}^n x_i \cdot \log p_{ki} \\ & = b + \mathbf{w}_k^\top \mathbf{x} \end{align} </math> where <math>b = \log p(C_k)</math> and <math>w_{ki} = \log p_{ki}</math>. Estimating the parameters in log space is advantageous since multiplying a large number of small values can lead to significant rounding error. Applying a log transform reduces the effect of this rounding error. If a given class and feature value never occur together in the training data, then the frequency-based probability estimate will be zero, because the probability estimate is directly proportional to the number of occurrences of a feature's value. This is problematic because it will wipe out all information in the other probabilities when they are multiplied. Therefore, it is often desirable to incorporate a small-sample correction, called [[pseudocount]], in all probability estimates such that no probability is ever set to be exactly zero. This way of [[regularization (mathematics)|regularizing]] naive Bayes is called [[Laplace smoothing]] when the pseudocount is one, and [[Lidstone smoothing]] in the general case.<!-- TODO: cite Jurafsky and Martin for this --> Rennie ''et al.'' discuss problems with the multinomial assumption in the context of document classification and possible ways to alleviate those problems, including the use of [[tfβidf]] weights instead of raw term frequencies and document length normalization, to produce a naive Bayes classifier that is competitive with [[support vector machine]]s.<ref name="rennie"/>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)