Editing Association rule learning (section)

=== Support ===
Support is an indication of how frequently the itemset appears in the dataset.

In our example, it can be easier to explain support by writing <math>\text{support} = P(A\cap B)= \frac{(\text{number of transactions containing }A\text{ and }B)}\text{ (total number of transactions)}  </math> <ref name=":1">{{Cite book|last1=Han|first1=Jiawei|last2=Kamber|first2=Micheline|last3=Pei|first3=Jian|date=2012|title=Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods|url=https://www.sciencedirect.com/science/article/pii/B978012381479100006X|doi=10.1016/B978-0-12-381479-1.00006-X|isbn=9780123814791}}</ref> where A and B are separate item sets that occur at the same time in a transaction.

Using Table 2 as an example, the itemset <math>X=\{\mathrm{beer, diapers}\}</math> has a support of {{math|1=1/5=0.2}} since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of ''support of X'' is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).<ref name=":0">{{Cite journal|last=Hahsler|first=Michael|date=2005|title=Introduction to arules – A computational environment for mining association rules and frequent item sets|url=https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|journal=Journal of Statistical Software|doi=10.18637/jss.v014.i15|doi-access=free|access-date=2016-03-18|archive-date=2019-04-30|archive-url=https://web.archive.org/web/20190430193743/https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|url-status=dead}}</ref>

Furthermore, the itemset <math>Y=\{\mathrm{milk, bread, butter}\}</math> has a support of {{math|1=1/5=0.2}} as it appears in 20% of all transactions as well.

When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example.

Minimum support thresholds are useful for determining which itemsets are preferred or interesting.

If we set the support threshold to ≥0.4 in Table 3, then the <math>\{\mathrm{milk}\} \Rightarrow \{\mathrm{eggs}\}</math> would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset.

Another way of finding interesting samples is to find the value of (support)&times;(confidence); this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items.

Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support &times; confidence, using the information from Table 4 to derive the confidence values.

  {| class="wikitable sortable"
|+Table 3. Example of Support, and support &times; confidence
!if Antecedent then Consequent
!support 
!support X confidence
|-
|if buy milk, then buy bread 
|2/5= 0.4
|0.4&times;1.0= 0.4
|-
|if buy milk, then buy eggs
|1/5= 0.2
|0.2&times;0.5= 0.1
|-
|if buy bread, then buy fruit
|2/5= 0.4
|0.4&times;0.66= 0.264
|-
|if buy fruit, then buy eggs
|2/5= 0.4
|0.4&times;0.66= 0.264
|-
|if buy milk and bread, then buy fruit
|2/5= 0.4
|0.4&times;1.0= 0.4
|}

The support of {{mvar|X}} with respect to {{mvar|T}} is defined as the proportion of transactions in the dataset which contains the itemset {{mvar|X}}. Denoting a transaction by <math>(i,t)</math> where {{mvar|i}} is the unique identifier of the transaction and {{mvar|t}} is its itemset, the support may be written as:

:<math>\mathrm{support\,of\,X} = \frac{|\{(i,t) \in T : X \subseteq t \}|}{|T|}</math>

This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together.<ref name=":1" />