Editing Feature selection (section)

==Correlation feature selection==
The correlation feature selection (CFS) measure evaluates subsets of features on the basis of the following hypothesis: "Good feature subsets contain features highly correlated with the classification, yet uncorrelated to each other".<ref>{{cite thesis |first=M. |last=Hall |date=1999 |type=PhD thesis |url=https://www.cs.waikato.ac.nz/~mhall/thesis.pdf |title=Correlation-based Feature Selection for Machine Learning |publisher=University of Waikato }}</ref><ref>{{cite book |last1=Senliol |first1=Baris |first2=Gokhan |last2=Gulgezen |first3=Lei |last3=Yu |first4=Zehra |last4=Cataltepe |title=2008 23rd International Symposium on Computer and Information Sciences |chapter=Fast Correlation Based Filter (FCBF) with a different search strategy |display-authors=1 |pages=1–4 |date=2008 |doi=10.1109/ISCIS.2008.4717949 |isbn=978-1-4244-2880-9 |s2cid=8398495 }}</ref> The following equation gives the merit of a feature subset ''S'' consisting of ''k'' features:

:<math> \mathrm{Merit}_{S_{k}} = \frac{k\overline{r_{cf}}}{\sqrt{k+k(k-1)\overline{r_{ff}}}}.</math>

Here, <math> \overline{r_{cf}} </math> is the average value of all feature-classification correlations, and <math> \overline{r_{ff}} </math> is the average value of all feature-feature correlations. The CFS criterion is defined as follows:

:<math>\mathrm{CFS} = \max_{S_k}
\left[\frac{r_{c f_1}+r_{c f_2}+\cdots+r_{c f_k}}
{\sqrt{k+2(r_{f_1 f_2}+\cdots+r_{f_i f_j}+ \cdots
+ r_{f_k f_{k-1} })}}\right].</math>

The <math>r_{cf_{i}}</math> and <math>r_{f_{i}f_{j}}</math> variables are referred to as correlations, but are not necessarily [[Pearson product-moment correlation coefficient|Pearson's correlation coefficient]] or [[Spearman's rank correlation coefficient|Spearman's ρ]]. Hall's dissertation uses neither of these, but uses three different measures of relatedness, [[minimum description length]] (MDL), [[Mutual Information#Normalized variants|symmetrical uncertainty]], and [[Relief (feature selection)|relief]].

Let ''x<sub>i</sub>'' be the set membership [[indicator function]] for feature ''f<sub>i</sub>''; then the above can be rewritten as an optimization problem:

:<math>\mathrm{CFS} = \max_{x\in \{0,1\}^{n}} 
\left[\frac{(\sum^{n}_{i=1}a_{i}x_{i})^{2}}
{\sum^{n}_{i=1}x_i + \sum_{i\neq j} 2b_{ij} x_i x_j }\right].</math>

The combinatorial problems above are, in fact, mixed 0–1 [[linear programming]] problems that can be solved by using [[branch-and-bound algorithm]]s.<ref>{{cite journal |first1=Hai |last1=Nguyen |first2=Katrin |last2=Franke |first3=Slobodan |last3=Petrovic |title=Optimizing a class of feature selection measures |journal=Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML) |location=Vancouver, Canada |date=December 2009 |url=https://www.researchgate.net/publication/231175763 }}</ref>