Editing Granular computing (section)

=== Value granulation (discretization/quantization) ===
One type of granulation is the [[Quantization (signal processing)|quantization]] of variables.  It is very common that in data mining or machine-learning applications the resolution of variables needs to be ''decreased'' in order to extract meaningful regularities.  An example of this would be a variable such as "outside temperature" ({{math|temp}}), which in a given application might be recorded to several decimal places of [[Arithmetic precision|precision]] (depending on the sensing apparatus).  However, for purposes of extracting relationships between "outside temperature" and, say, "number of health-club applications" ({{mvar|club}}), it will generally be advantageous to quantize "outside temperature" into a smaller number of intervals.

==== Motivations ====
There are several interrelated reasons for granulating variables in this fashion:
* Based on prior [[domain knowledge]], there is no expectation that minute variations in temperature (e.g., the difference between {{convert|80|-|80.7|°F|C|1}}) could have an influence on behaviors driving the number of health-club applications.  For this reason, any "regularity" which our learning algorithms might detect at this level of resolution would have to be ''spurious'', as an artifact of overfitting.  By coarsening the temperature variable into intervals the difference between which we ''do'' anticipate (based on prior domain knowledge) might influence  number of health-club applications, we eliminate the possibility of detecting these spurious  patterns.  Thus, in this case, reducing resolution is a method of controlling [[overfitting]].
* By reducing the number of intervals in the temperature variable (i.e., increasing its ''grain size''), we increase the amount of sample data indexed by each interval designation.  Thus, by coarsening the variable, we increase sample sizes and achieve better statistical estimation.  In this sense, increasing granularity provides an antidote to the so-called ''[[curse of dimensionality]]'', which relates to the exponential decrease in statistical power with increase in number of dimensions or variable cardinality.
*Independent of prior domain knowledge, it is often the case that meaningful regularities (i.e., which can be detected by a given learning methodology, representational language, etc.) may exist at one level of resolution and not at another.

[[File:Value granulation.png|thumb|200 px|Benefits of value granulation: Implications here exist at the resolution of <math>\{X_i,Y_j\}</math> that do not exist at the higher resolution of <math>\{x_i,y_j\};</math> in particular, <math>\forall x_i,y_j: x_i \not\to y_j,</math> while at the same time, <math>\forall X_i \exists Y_j: X_i \leftrightarrow Y_j.</math>]]
For example, a simple learner or pattern recognition system may seek to extract regularities satisfying a [[conditional probability]] threshold such as <math>p(Y=y_j|X=x_i) \ge \alpha .</math> In the special case where <math>\alpha = 1,</math> this recognition system is essentially detecting ''[[logical implication]]'' of the form <math>X=x_i \rightarrow Y=y_j </math> or, in words, "if <math>X=x_i,</math> then {{nowrap|<math>Y=y_j </math>".}} The system's ability to recognize such implications (or, in general, conditional probabilities exceeding threshold)  is partially contingent on the resolution with which the system analyzes the variables.

As an example of this last point, consider the feature space shown to the right.  The variables may each be regarded at two different resolutions.  Variable <math>X</math> may be regarded at a high (quaternary) resolution wherein it takes on the four values <math>\{x_1, x_2, x_3, x_4\}</math> or at a lower (binary) resolution wherein it takes on the two values <math>\{X_1, X_2\}.</math>  Similarly, variable <math>Y</math> may be regarded at a high (quaternary) resolution or at a lower (binary) resolution, where it takes on the values <math>\{y_1, y_2, y_3, y_4\}</math> or <math>\{Y_1, Y_2\},</math> respectively. At the high resolution, there are '''no''' detectable implications of the form <math>X=x_i \rightarrow Y=y_j,</math> since every <math>x_i</math> is associated with more than one <math>y_j,</math> and thus, for all <math>x_i,</math> <math>p(Y=y_j|X=x_i) < 1.</math> However, at the low (binary) variable resolution, two bilateral implications become detectable:    <math>X=X_1 \leftrightarrow Y=Y_1 </math> and <math>X=X_2 \leftrightarrow Y=Y_2 </math>, since every <math>X_1</math> occurs ''iff'' <math>Y_1</math> and <math>X_2</math> occurs ''iff'' <math>Y_2.</math> Thus, a pattern recognition system scanning for implications of this kind would find them at the binary variable resolution, but would fail to find them at the   higher quaternary variable resolution.

====Issues and methods====
It is not feasible to exhaustively test all possible discretization resolutions on all variables in order to see which combination of resolutions yields interesting or significant results.  Instead, the feature space must be preprocessed (often by an [[information entropy|entropy]] analysis of some kind) so that some guidance can be given as to how the discretization process should proceed.  Moreover, one cannot generally achieve good results by naively analyzing and discretizing each variable independently, since this may obliterate the very interactions that we had hoped to discover.

A sample of papers that address the problem of variable discretization in general, and multiple-variable discretization in particular, is as follows: {{Harvtxt|Chiu|Wong|Cheung|1991}}, {{Harvtxt|Bay|2001}},  {{Harvtxt|Liu|Hussain|Tan|Dasii|2002}}, {{Harvtxt|Wang|Liu|1998}}, {{Harvtxt|Zighed|Rabaséda|Rakotomalala|1998}}, {{Harvtxt|Catlett|1991}}, {{Harvtxt|Dougherty|Kohavi|Sahami|1995}}, {{Harvtxt|Monti|Cooper|1999}}, {{Harvtxt|Fayyad|Irani|1993}}, {{Harvtxt|Chiu|Cheung|Wong|1990}}, {{Harvtxt|Nguyen|Nguyen|1998}}, {{Harvtxt|Grzymala-Busse|Stefanowski|2001}}, {{Harvtxt|Ting|1994}}, {{Harvtxt|Ludl|Widmer|2000}}, {{Harvtxt|Pfahringer|1995}}, {{Harvtxt|An|Cercone|1999}}, 
{{Harvtxt|Chiu|Cheung|1989}}, {{Harvtxt|Chmielewski|Grzymala-Busse|1996}}, {{Harvtxt|Lee|Shin|1994}}, {{Harvtxt|Liu|Wellman|2002}}, {{Harvtxt|Liu|Wellman|2004}}.