Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Categorical variable
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Notation== For ease in statistical processing, categorical variables may be assigned numeric indices, e.g. 1 through ''K'' for a ''K''-way categorical variable (i.e. a variable that can express exactly ''K'' possible values). In general, however, the numbers are arbitrary, and have no significance beyond simply providing a convenient label for a particular value. In other words, the values in a categorical variable exist on a [[nominal scale]]: they each represent a logically separate concept, cannot necessarily be meaningfully [[Level of measurement#Ordinal scale|ordered]], and cannot be otherwise manipulated as numbers could be. Instead, valid operations are [[Equivalence relation|equivalence]], [[set membership]], and other set-related operations. As a result, the [[central tendency]] of a set of categorical variables is given by its [[Mode (statistics)|mode]]; neither the [[Mean (statistics)|mean]] nor the [[Median (statistics)|median]] can be defined. As an example, given a set of people, we can consider the set of categorical variables corresponding to their last names. We can consider operations such as equivalence (whether two people have the same last name), set membership (whether a person has a name in a given list), counting (how many people have a given last name), or finding the mode (which name occurs most often). However, we cannot meaningfully compute the "sum" of Smith + Johnson, or ask whether Smith is "less than" or "greater than" Johnson. As a result, we cannot meaningfully ask what the "average name" (the mean) or the "middle-most name" (the median) is in a set of names. This ignores the concept of [[alphabetical order]], which is a property that is not inherent in the names themselves, but in the way we construct the labels. For example, if we write the names in [[Cyrillic]] and consider the Cyrillic ordering of letters, we might get a different result of evaluating "Smith < Johnson" than if we write the names in the standard [[Latin alphabet]]; and if we write the names in [[Chinese characters]], we cannot meaningfully evaluate "Smith < Johnson" at all, because no consistent ordering is defined for such characters. However, if we do consider the names as written, e.g., in the Latin alphabet, and define an ordering corresponding to standard alphabetical order, then we have effectively converted them into [[ordinal variable]]s defined on an [[ordinal scale]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)