Editing Statistical classification (section)

==Relation to other problems==
[[Classification]] and clustering are examples of the more general problem of [[pattern recognition]], which is the assignment of some sort of output value to a given input value.  Other examples are [[regression analysis|regression]], which assigns a real-valued output to each input; [[sequence labeling]], which assigns a class to each member of a sequence of values (for example, [[part of speech tagging]], which assigns a [[part of speech]] to each word in an input sentence); [[parsing]], which assigns a [[parse tree]] to an input sentence, describing the [[syntactic structure]] of the sentence; etc.

A common subclass of classification is [[probabilistic classification]].  Algorithms of this nature use [[statistical inference]] to find the best class for a given instance.  Unlike other algorithms, which simply output a "best" class, probabilistic algorithms output a [[probability]] of the instance being a member of each of the possible classes.  The best class is normally then selected as the one with the highest probability.  However, such an algorithm has numerous advantages over non-probabilistic classifiers:
*It can output a confidence value associated with its choice (in general, a classifier that can do this is known as a ''confidence-weighted classifier'').
*Correspondingly, it can ''abstain'' when its confidence of choosing any particular output is too low.
*Because of the probabilities which are generated, probabilistic classifiers can be more effectively incorporated into larger machine-learning tasks, in a way that partially or completely avoids the problem of ''error propagation''.