Editing Feature selection (section)

===Minimum-redundancy-maximum-relevance (mRMR) feature selection===
Peng ''et al.''<ref>{{cite journal |last1=Peng |first1=H. C. |last2=Long |first2=F. |last3=Ding |first3=C. |title=Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy |journal= [[IEEE Transactions on Pattern Analysis and Machine Intelligence]] |volume=27 |issue=8 |pages=1226–1238 |year=2005 |doi=10.1109/TPAMI.2005.159 |pmid=16119262|citeseerx=10.1.1.63.5765 |s2cid=206764015 }} [http://home.penglab.com/proj/mRMR/index.htm Program]</ref> proposed a feature selection method that can use either mutual information, correlation, or distance/similarity scores to select features. The aim is to penalise a feature's relevancy by its redundancy in the presence of the other selected features. The relevance of a feature set {{mvar|S}} for the class {{mvar|c}} is defined by the average value of all mutual information values between the individual feature {{math|''f<sub>i</sub>''}} and the class {{mvar|c}} as follows:

:<math> D(S,c) = \frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) </math>.

The redundancy of all features in the set {{mvar|S}} is the average value of all mutual information values between the feature {{math|''f<sub>i</sub>''}} and the feature {{math|''f<sub>j</sub>''}}:

:<math> R(S) = \frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})</math>

The mRMR criterion is a combination of two measures given above and is defined as follows:

:<math>\mathrm{mRMR}= \max_{S}
\left[\frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) - 
\frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})\right].</math>

Suppose that there are {{mvar|n}} full-set features. Let {{math|''x<sub>i</sub>''}} be the set membership [[indicator function]] for feature {{math|''f<sub>i</sub>''}}, so that {{math|1=''x<sub>i</sub>''=1}} indicates presence and {{math|1=''x<sub>i</sub>''=0}} indicates absence of the feature {{math|''f<sub>i</sub>''}} in the globally optimal feature set. Let <math>c_i=I(f_i;c)</math> and <math>a_{ij}=I(f_i;f_j)</math>. The above may then be written as an optimization problem:

:<math>\mathrm{mRMR}= \max_{x\in \{0,1\}^{n}} 
\left[\frac{\sum^{n}_{i=1}c_{i}x_{i}}{\sum^{n}_{i=1}x_{i}} -
\frac{\sum^{n}_{i,j=1}a_{ij}x_{i}x_{j}}
{(\sum^{n}_{i=1}x_{i})^{2}}\right].</math>

The mRMR algorithm is an approximation of the theoretically optimal maximum-dependency feature selection algorithm that maximizes the mutual information between the joint distribution of the selected features and the classification variable. As mRMR approximates the combinatorial estimation problem with a series of much smaller problems, each of which only involves two variables, it thus uses pairwise joint probabilities which are more robust. In certain situations the algorithm may underestimate the usefulness of features as it has no way to measure interactions between features which can increase relevancy. This can lead to poor performance<ref name="Brown" /> when the features are individually useless, but are useful when combined (a pathological case is found when the class is a [[parity function]] of the features). Overall the algorithm is more efficient (in terms of the amount of data required) than the theoretically optimal max-dependency selection, yet produces a feature set with little pairwise redundancy.

mRMR is an instance of a large class of filter methods which trade off between relevancy and redundancy in different ways.<ref name="Brown"/><ref name="docs.google">Nguyen, H., Franke, K., Petrovic, S. (2010). "Towards a Generic Feature-Selection Measure for Intrusion Detection", In Proc. International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. [https://www.researchgate.net/publication/220928649_Towards_a_Generic_Feature-Selection_Measure_for_Intrusion_Detection?ev=prf_pub]</ref>