Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Feature selection
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Minimum-redundancy-maximum-relevance (mRMR) feature selection=== Peng ''et al.''<ref>{{cite journal |last1=Peng |first1=H. C. |last2=Long |first2=F. |last3=Ding |first3=C. |title=Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy |journal= [[IEEE Transactions on Pattern Analysis and Machine Intelligence]] |volume=27 |issue=8 |pages=1226β1238 |year=2005 |doi=10.1109/TPAMI.2005.159 |pmid=16119262|citeseerx=10.1.1.63.5765 |s2cid=206764015 }} [http://home.penglab.com/proj/mRMR/index.htm Program]</ref> proposed a feature selection method that can use either mutual information, correlation, or distance/similarity scores to select features. The aim is to penalise a feature's relevancy by its redundancy in the presence of the other selected features. The relevance of a feature set {{mvar|S}} for the class {{mvar|c}} is defined by the average value of all mutual information values between the individual feature {{math|''f<sub>i</sub>''}} and the class {{mvar|c}} as follows: :<math> D(S,c) = \frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) </math>. The redundancy of all features in the set {{mvar|S}} is the average value of all mutual information values between the feature {{math|''f<sub>i</sub>''}} and the feature {{math|''f<sub>j</sub>''}}: :<math> R(S) = \frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})</math> The mRMR criterion is a combination of two measures given above and is defined as follows: :<math>\mathrm{mRMR}= \max_{S} \left[\frac{1}{|S|}\sum_{f_{i}\in S}I(f_{i};c) - \frac{1}{|S|^{2}}\sum_{f_{i},f_{j}\in S}I(f_{i};f_{j})\right].</math> Suppose that there are {{mvar|n}} full-set features. Let {{math|''x<sub>i</sub>''}} be the set membership [[indicator function]] for feature {{math|''f<sub>i</sub>''}}, so that {{math|1=''x<sub>i</sub>''=1}} indicates presence and {{math|1=''x<sub>i</sub>''=0}} indicates absence of the feature {{math|''f<sub>i</sub>''}} in the globally optimal feature set. Let <math>c_i=I(f_i;c)</math> and <math>a_{ij}=I(f_i;f_j)</math>. The above may then be written as an optimization problem: :<math>\mathrm{mRMR}= \max_{x\in \{0,1\}^{n}} \left[\frac{\sum^{n}_{i=1}c_{i}x_{i}}{\sum^{n}_{i=1}x_{i}} - \frac{\sum^{n}_{i,j=1}a_{ij}x_{i}x_{j}} {(\sum^{n}_{i=1}x_{i})^{2}}\right].</math> The mRMR algorithm is an approximation of the theoretically optimal maximum-dependency feature selection algorithm that maximizes the mutual information between the joint distribution of the selected features and the classification variable. As mRMR approximates the combinatorial estimation problem with a series of much smaller problems, each of which only involves two variables, it thus uses pairwise joint probabilities which are more robust. In certain situations the algorithm may underestimate the usefulness of features as it has no way to measure interactions between features which can increase relevancy. This can lead to poor performance<ref name="Brown" /> when the features are individually useless, but are useful when combined (a pathological case is found when the class is a [[parity function]] of the features). Overall the algorithm is more efficient (in terms of the amount of data required) than the theoretically optimal max-dependency selection, yet produces a feature set with little pairwise redundancy. mRMR is an instance of a large class of filter methods which trade off between relevancy and redundancy in different ways.<ref name="Brown"/><ref name="docs.google">Nguyen, H., Franke, K., Petrovic, S. (2010). "Towards a Generic Feature-Selection Measure for Intrusion Detection", In Proc. International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. [https://www.researchgate.net/publication/220928649_Towards_a_Generic_Feature-Selection_Measure_for_Intrusion_Detection?ev=prf_pub]</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)