Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Naive Bayes classifier
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Gaussian naive Bayes=== When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a [[Normal distribution|normal]] (or Gaussian) distribution. For example, suppose the training data contains a continuous attribute, '''<math>x</math>'''. The data is first segmented by the class, and then the mean and [[Variance#Estimating the variance|variance]] of <math>x</math> is computed in each class. Let <math>\mu_k</math> be the mean of the values in <math>x</math> associated with class <math>C_k</math>, and let <math>\sigma^2_k</math> be the [[Bessel's correction|Bessel corrected variance]] of the values in <math>x</math> associated with class <math>C_k</math>. Suppose one has collected some observation value <math>v</math>. Then, the probability ''density'' of <math>v</math> given a class <math>C_k</math>, i.e., <math>p(x=v \mid C_k)</math>, can be computed by plugging <math>v</math> into the equation for a [[normal distribution]] parameterized by <math>\mu_k</math> and <math>\sigma^2_k</math>. Formally, <math display="block"> p(x=v \mid C_k) = \frac{1}{\sqrt{2\pi\sigma^2_k}}\,e^{ -\frac{(v-\mu_k)^2}{2\sigma^2_k} } </math> Another common technique for handling continuous values is to use binning to [[Discretization of continuous features|discretize]] the feature values and obtain a new set of Bernoulli-distributed features. Some literature suggests that this is required in order to use naive Bayes, but it is not true, as the discretization may [[Discretization error|throw away discriminative information]].<ref name="idiots"/> Sometimes the distribution of class-conditional marginal densities is far from normal. In these cases, [[kernel density estimation]] can be used for a more realistic estimate of the marginal densities of each class. This method, which was introduced by John and Langley,<ref name="john95"/> can boost the accuracy of the classifier considerably.<ref name="piryonesi2020">{{Cite journal |last1=Piryonesi |first1=S. Madeh |last2=El-Diraby |first2=Tamer E. |date=2020-06-01 |title=Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems |journal=Journal of Transportation Engineering, Part B: Pavements |volume=146 |issue=2 |pages=04020022 |doi=10.1061/JPEODX.0000175 |s2cid=216485629}}</ref><ref name="hastie01">{{Cite book |last=Hastie, Trevor. |title=The elements of statistical learning : data mining, inference, and prediction : with 200 full-color illustrations |date=2001 |publisher=Springer |others=Tibshirani, Robert., Friedman, J. H. (Jerome H.) |isbn=0-387-95284-5 |location=New York |oclc=46809224}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)