Editing Statistical classification (section)

== Feature vectors ==
{{main|Feature vector}}

Most algorithms describe an individual instance whose category is to be predicted using a [[feature vector]] of individual, measurable properties of the instance.  Each property is termed a [[feature (pattern recognition)|feature]], also known in statistics as an [[explanatory variable]] (or [[independent variable]], although features may or may not be [[statistically independent]]).  Features may variously be [[binary data|binary]] (e.g. "on" or "off"); [[categorical data|categorical]] (e.g. "A", "B", "AB" or "O", for [[blood type]]); [[ordinal data|ordinal]] (e.g. "large", "medium" or "small"); [[integer|integer-valued]] (e.g. the number of occurrences of a particular word in an email); or [[real number|real-valued]] (e.g. a measurement of blood pressure).  If the instance is an image, the feature values might correspond to the pixels of an image; if the instance is a piece of text, the feature values might be occurrence frequencies of different words.  Some algorithms work only in terms of discrete data and require that real-valued or integer-valued data be ''discretized'' into groups (e.g. less than 5, between 5 and 10, or greater than 10).