Editing Association rule learning (section)

== Useful Concepts ==
{|class="wikitable" style="float: right; margin-left: 1em;"
|+ Table 2. Example database with 5 transactions and 7 items
|-
! transaction ID !! milk !! bread !! butter !! beer !! diapers !! eggs !! fruit
|-
| 1 || 1 || 1 || 0 || 0 || 0 || 0 || 1
|-
| 2 || 0 || 0 || 1 || 0 || 0 || 1 || 1
|-
| 3 || 0 || 0 || 0 || 1 || 1 || 0 || 0
|-
| 4 || 1 || 1 || 1 || 0 || 0 || 1 || 1
|-
| 5 || 0 || 1 || 0 || 0 || 0 || 0 || 0
|-
|}
To illustrate the concepts, we use a small example from the supermarket domain. Table 2 shows a small database containing the items where, in each entry, the value 1 means the presence of the item in the corresponding transaction, and the value 0 represents the absence of an item in that transaction. The set of items is <math>I= \{\mathrm{milk, bread, butter, beer, diapers, eggs, fruit}\}</math>.

An example rule for the supermarket could be <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> meaning that if butter and bread are bought, customers also buy milk.

In order to select interesting rules from the set of all possible rules, constraints on various measures of significance and interest are used. The best-known constraints are minimum thresholds on support and confidence.

Let <math>X, Y</math> be itemsets, <math>X \Rightarrow Y</math> an association rule and {{mvar|T}} a set of transactions of a given database.

Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant,{{citation needed|date=March 2021}} and datasets often contain thousands or millions of transactions.

=== Support ===
Support is an indication of how frequently the itemset appears in the dataset.

In our example, it can be easier to explain support by writing <math>\text{support} = P(A\cap B)= \frac{(\text{number of transactions containing }A\text{ and }B)}\text{ (total number of transactions)}  </math> <ref name=":1">{{Cite book|last1=Han|first1=Jiawei|last2=Kamber|first2=Micheline|last3=Pei|first3=Jian|date=2012|title=Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods|url=https://www.sciencedirect.com/science/article/pii/B978012381479100006X|doi=10.1016/B978-0-12-381479-1.00006-X|isbn=9780123814791}}</ref> where A and B are separate item sets that occur at the same time in a transaction.

Using Table 2 as an example, the itemset <math>X=\{\mathrm{beer, diapers}\}</math> has a support of {{math|1=1/5=0.2}} since it occurs in 20% of all transactions (1 out of 5 transactions). The argument of ''support of X'' is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).<ref name=":0">{{Cite journal|last=Hahsler|first=Michael|date=2005|title=Introduction to arules – A computational environment for mining association rules and frequent item sets|url=https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|journal=Journal of Statistical Software|doi=10.18637/jss.v014.i15|doi-access=free|access-date=2016-03-18|archive-date=2019-04-30|archive-url=https://web.archive.org/web/20190430193743/https://mran.revolutionanalytics.com/web/packages/arules/vignettes/arules.pdf|url-status=dead}}</ref>

Furthermore, the itemset <math>Y=\{\mathrm{milk, bread, butter}\}</math> has a support of {{math|1=1/5=0.2}} as it appears in 20% of all transactions as well.

When using antecedents and consequents, it allows a data miner to determine the support of multiple items being bought together in comparison to the whole data set. For example, Table 2 shows that if milk is bought, then bread is bought has a support of 0.4 or 40%. This because in 2 out 5 of the transactions, milk as well as bread are bought. In smaller data sets like this example, it is harder to see a strong correlation when there are few samples, but when the data set grows larger, support can be used to find correlation between two or more products in the supermarket example.

Minimum support thresholds are useful for determining which itemsets are preferred or interesting.

If we set the support threshold to ≥0.4 in Table 3, then the <math>\{\mathrm{milk}\} \Rightarrow \{\mathrm{eggs}\}</math> would be removed since it did not meet the minimum threshold of 0.4. Minimum threshold is used to remove samples where there is not a strong enough support or confidence to deem the sample as important or interesting in the dataset.

Another way of finding interesting samples is to find the value of (support)&times;(confidence); this allows a data miner to see the samples where support and confidence are high enough to be highlighted in the dataset and prompt a closer look at the sample to find more information on the connection between the items.

Support can be beneficial for finding the connection between products in comparison to the whole dataset, whereas confidence looks at the connection between one or more items and another item. Below is a table that shows the comparison and contrast between support and support &times; confidence, using the information from Table 4 to derive the confidence values.

  {| class="wikitable sortable"
|+Table 3. Example of Support, and support &times; confidence
!if Antecedent then Consequent
!support 
!support X confidence
|-
|if buy milk, then buy bread 
|2/5= 0.4
|0.4&times;1.0= 0.4
|-
|if buy milk, then buy eggs
|1/5= 0.2
|0.2&times;0.5= 0.1
|-
|if buy bread, then buy fruit
|2/5= 0.4
|0.4&times;0.66= 0.264
|-
|if buy fruit, then buy eggs
|2/5= 0.4
|0.4&times;0.66= 0.264
|-
|if buy milk and bread, then buy fruit
|2/5= 0.4
|0.4&times;1.0= 0.4
|}

The support of {{mvar|X}} with respect to {{mvar|T}} is defined as the proportion of transactions in the dataset which contains the itemset {{mvar|X}}. Denoting a transaction by <math>(i,t)</math> where {{mvar|i}} is the unique identifier of the transaction and {{mvar|t}} is its itemset, the support may be written as:

:<math>\mathrm{support\,of\,X} = \frac{|\{(i,t) \in T : X \subseteq t \}|}{|T|}</math>

This notation can be used when defining more complicated datasets where the items and itemsets may not be as easy as our supermarket example above. Other examples of where support can be used is in finding groups of genetic mutations that work collectively to cause a disease, investigating the number of subscribers that respond to upgrade offers, and discovering which products in a drug store are never bought together.<ref name=":1" />

=== Confidence ===
Confidence is the percentage of all transactions satisfying {{mvar|X}} that also satisfy {{mvar|Y}}.<ref>{{Cite web|last=Wong|first=Pak|date=1999|title=Visualizing Association Rules for Text Mining|url=https://neuro.bstu.by/ai/Data-mining/Stock-market/InfoVis1999Association.pdf|url-status=live|website=BSTU Laboratory of Artificial Neural Networks|archive-url=https://web.archive.org/web/20211129082512/https://neuro.bstu.by/ai/Data-mining/Stock-market/InfoVis1999Association.pdf |archive-date=2021-11-29 }}</ref>

With respect to {{mvar|T}}, the confidence value of an association rule, often denoted as <math>X \Rightarrow Y</math>, is the ratio of transactions containing both {{mvar|X}} and {{mvar|Y}} to the total amount of {{mvar|X}} values present, where {{mvar|X}} is the antecedent and {{mvar|Y}} is the consequent.

Confidence can also be interpreted as an estimate of the [[conditional probability]] <math>P(E_Y | E_X)</math>, the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.<ref name=":0" /><ref name="hipp">{{Cite journal | last1 = Hipp | first1 = J. | last2 = Güntzer | first2 = U. | last3 = Nakhaeizadeh | first3 = G. | title = Algorithms for association rule mining --- a general survey and comparison | doi = 10.1145/360402.360421 | journal = ACM SIGKDD Explorations Newsletter | volume = 2 | pages = 58–64 | year = 2000 | citeseerx = 10.1.1.38.5305 | s2cid = 9248096 }}</ref>

It is commonly depicted as:

:<math>\mathrm{conf}(X \Rightarrow Y) = P(Y | X) = \frac{\mathrm{supp}(X \cap Y)}{ \mathrm{supp}(X) }=\frac{\text{number of transactions containing }X\text{ and }Y}{\text{number of transactions containing }X}</math>

The equation illustrates that confidence can be computed by calculating the co-occurrence of transactions {{mvar|X}} and {{mvar|Y}} within the dataset in ratio to transactions containing only {{mvar|X}}. This means that the number of transactions in both  {{mvar|X}} and {{mvar|Y}}  is divided by those just in {{mvar|X}} .

For example, Table 2 shows the rule <math>\{\mathrm{butter, bread}\} \Rightarrow \{\mathrm{milk}\}</math> which has a confidence of <math>\frac{1/5}{1/5}=\frac{0.2}{0.2}=1.0</math> in the dataset, which denotes that every time a customer buys butter and bread, they also buy milk. This particular example demonstrates the rule being correct 100% of the time for transactions containing both butter and bread. The rule <math>\{\mathrm{fruit}\} \Rightarrow \{\mathrm{eggs}\}</math>, however, has a confidence of <math>\frac{2/5}{3/5}=\frac{0.4}{0.6}=0.67</math>. This suggests that eggs are bought 67% of the times that fruit is brought. Within this particular dataset, fruit is purchased a total of 3 times, with two of those times consisting of egg purchases.

For larger datasets, a minimum threshold, or a percentage cutoff, for the confidence can be useful for determining item relationships. When applying this method to some of the data in Table 2, information that does not meet the requirements are removed. Table 4 shows association rule examples where the minimum threshold for confidence is 0.5 (50%). Any data that does not have a confidence of at least 0.5 is omitted. Generating thresholds allow for the association between items to become stronger as the data is further researched by emphasizing those that co-occur the most. The table uses the confidence information from Table 3 to implement the Support &times; Confidence column, where the relationship between items via their both confidence and support, instead of just one concept, is highlighted. Ranking the rules by Support &times; Confidence multiples the confidence of a particular rule to its support and is often implemented for a more in-depth understanding of the relationship between the items.

  {| class="wikitable sortable"
|+Table 4. Example of Confidence and Support &times; Confidence
!if Antecedent then Consequent
!Confidence 
!Support &times; Confidence
|-
|if buy milk, then buy bread
|{{frac|2|2}} = 1.0
|0.4&times;1.0= 0.4
|-
|if buy milk, then buy eggs
|{{1/2}} = 0.5
|0.2&times;0.5= 0.1
|-
|if buy bread, then buy fruit
|{{2/3}} &asymp; 0.66
|0.4&times;0.66= 0.264
|-
|if buy fruit, then buy eggs
|{{2/3}} &asymp; 0.66
|0.4&times;0.66= 0.264
|-
|if buy milk and bread, then buy fruit
|{{frac|2|2}} = 1.0
|0.4&times;1.0= 0.4
|}

Overall, using confidence in association rule mining is great way to bring awareness to data relations. Its greatest benefit is highlighting the relationship between particular items to one another within the set, as it compares co-occurrences of items to the total occurrence of the antecedent in the specific rule. However, confidence is not the optimal method for every concept in association rule mining. The disadvantage of using it is that it does not offer multiple difference outlooks on the associations. Unlike support, for instance, confidence does not provide the perspective of relationships between certain items in comparison to the entire dataset, so while milk and bread, for example, may occur 100% of the time for confidence, it only has a support of 0.4 (40%). This is why it is important to look at other viewpoints, such as Support &times; Confidence, instead of solely relying on one concept incessantly to define the relationships.

=== Lift ===
The ''[[lift (data mining)|lift]]'' of a rule is defined as:

<math> \mathrm{lift}(X\Rightarrow Y) = \frac{ \mathrm{supp}(X \cup Y)}{ \mathrm{supp}(X) \times \mathrm{supp}(Y) } </math>

or the ratio of the observed support to that expected if X and Y were [[Independence (probability theory)|independent]].

For example, the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a lift of <math>\frac{0.2}{0.4 \times 0.4} = 1.25 </math>.

If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events.

If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets.

If the lift is < 1, that lets us know the items are substitute to each other. This means that presence of one item has negative effect on presence of other item and vice versa.

The value of lift is that it considers both the support of the rule and the overall data set.<ref name=":0" />

[rede]

=== Conviction ===
The ''conviction'' of a rule is defined as <math> \mathrm{conv}(X\Rightarrow Y) =\frac{ 1 - \mathrm{supp}(Y) }{ 1 - \mathrm{conf}(X\Rightarrow Y)}</math>.<ref name="brin-dynamic-itemset1">{{cite book |doi=10.1145/253260.253325 |chapter=Dynamic itemset counting and implication rules for market basket data |title=Proceedings of the 1997 ACM SIGMOD international conference on Management of data - SIGMOD '97 |pages=255–264 |year=1997 |last1=Brin |first1=Sergey |last2=Motwani |first2=Rajeev |last3=Ullman |first3=Jeffrey D. |last4=Tsur |first4=Shalom |isbn=978-0897919111 |citeseerx=10.1.1.41.6476 |s2cid=15385590 }}</ref>

For example, the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> has a conviction of <math>\frac{1 - 0.4}{1 - 0.5} = 1.2 </math>, and can be interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule <math>\{\mathrm{milk, bread}\} \Rightarrow \{\mathrm{butter}\}</math> would be incorrect 20% more often (1.2 times as often) if the association between X and Y was purely random chance.

=== Alternative measures of interestingness ===
In addition to confidence, other measures of ''interestingness'' for rules have been proposed. Some popular measures are:

* All-confidence<ref name="allconfidence">{{cite journal |doi=10.1109/TKDE.2003.1161582 |title=Alternative interest measures for mining associations in databases |journal=IEEE Transactions on Knowledge and Data Engineering |volume=15 |pages=57–69 |year=2003 |last1=Omiecinski |first1=E.R. |citeseerx=10.1.1.329.5344 |s2cid=18364249 }}</ref>
* Collective strength<ref name="collectivestrength">{{cite book |doi=10.1145/275487.275490 |chapter=A new framework for itemset generation |title=Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '98 |pages=18–24 |year=1998 |last1=Aggarwal |first1=Charu C. |last2=Yu |first2=Philip S. |isbn=978-0897919968 |citeseerx=10.1.1.24.714 |s2cid=11934586 }}</ref>
* Leverage<ref name="leverage">Piatetsky-Shapiro, Gregory; ''Discovery, analysis, and presentation of strong rules'', Knowledge Discovery in Databases, 1991, pp. 229-248</ref>

Several more measures are presented and compared by Tan et al.<ref name="measurescomp">{{cite journal |doi=10.1016/S0306-4379(03)00072-3 |title=Selecting the right objective measure for association analysis |journal=Information Systems |volume=29 |issue=4 |pages=293–313 |year=2004 |last1=Tan |first1=Pang-Ning |last2=Kumar |first2=Vipin |last3=Srivastava |first3=Jaideep |citeseerx=10.1.1.331.4740 }}</ref> and by Hahsler.<ref name="michael.hahsler.net">Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. https://mhahsler.github.io/arules/docs/measures</ref> Looking for techniques that can model what the user has known (and using these models as interestingness measures) is currently an active research trend under the name of "Subjective Interestingness."