Editing Granular computing (section)

====Variable aggregation====
A different class of variable granulation methods derive more from [[data clustering]] methodologies than from the linear systems theory informing the above methods.  It was noted fairly early that one may consider "clustering" related variables in just  the same way that one considers clustering related data.  In data clustering, one identifies a group of similar entities (using a "[[measure of similarity]]" suitable to the domain — {{Harvtxt|Martino|Giuliani|Rizzi|2018}}), and then in some sense ''replaces'' those entities with a prototype of some kind.  The prototype may be the simple average of the data in the identified cluster, or some other representative measure.  But the key idea is that in subsequent operations, we may be able to use the single prototype for the data cluster (along with perhaps a statistical model describing how exemplars are derived from the prototype) to ''stand in'' for the much larger set of exemplars.   These prototypes are generally such as to capture most of the information of interest concerning the entities.

[[File:Kraskov tree.png|thumb|400 px|A Watanabe-Kraskov variable agglomeration tree. Variables are agglomerated (or "unitized") from the bottom-up, with each merge-node representing a (constructed) variable having entropy equal to the joint entropy of the agglomerating variables. Thus, the agglomeration of two {{mvar|m}}-ary variables <math>X_1, X_2</math> having individual entropies <math>H(X_1), H(X_2)</math> yields a single {{math|''m''{{sup|2}}}}-ary variable <math>X_{1,2}</math> with entropy <math>H(X_{1,2})=H(X_1,X_2).</math> When <math>X_1, X_2</math> are highly dependent (i.e., redundant) and have large mutual information <math>I(X_1;X_2),</math> then <math>H(X_{1,2}) \ll H(X_1)+H(X_2)</math> because <math>H(X_1,X_2)=H(X_1)+H(X_2)-I(X_1;X_2),</math> and this would be considered a parsimonious unitization or aggregation.]]
Similarly, it is reasonable to ask whether a large set of variables might be aggregated into a smaller set of ''prototype'' variables that capture the most salient relationships between the variables.    Although variable clustering methods based on [[linear correlation]] have been proposed ({{Harvnb|Duda|Hart|Stork|2001}};{{Harvnb|Rencher|2002}}), more powerful methods of variable clustering are based on the [[mutual information]] between variables. Watanabe has shown ({{Harvnb|Watanabe|1960}};{{Harvnb|Watanabe|1969}}) that for any set of variables one can construct a ''[[polytomy|polytomic]]'' (i.e., n-ary) tree representing a series of variable agglomerations in which the ultimate "total" correlation  among the complete variable set is the sum of the "partial" correlations exhibited by each agglomerating subset (see figure). Watanabe suggests that an observer might seek to thus partition a system in such a way as to minimize the interdependence between the parts "... as if they were looking for a natural division or a hidden crack."

One practical approach to building such a tree is to successively choose for agglomeration the two variables (either atomic variables or previously agglomerated variables) which have the highest pairwise mutual information {{Harv|Kraskov|Stögbauer|Andrzejak|Grassberger|2003}}. The product of each agglomeration is a new (constructed) variable that reflects the local [[joint distribution]] of the two agglomerating variables, and thus possesses an entropy equal to their [[joint entropy]].
(From a procedural standpoint, this agglomeration step involves replacing two columns in the attribute-value table—representing the two agglomerating variables—with a single column that has a unique value for every unique combination of values in the replaced columns {{Harv|Kraskov|Stögbauer|Andrzejak|Grassberger|2003}}.  No information is lost by such an operation; however, if one is exploring the data for inter-variable relationships, it would generally ''not'' be desirable to merge redundant variables in this way, since in such a context it is likely to be precisely the redundancy or ''dependency'' between variables that is of interest;  and once redundant variables are merged, their relationship to one another can no longer be studied.