Apriori. Let’s look at a concrete example, based on the AllElectronics transacti
ID: 3906062 • Letter: A
Question
Apriori. Let’s look at a concrete example, based on the AllElectronics transaction database, D, of Table 6.1. There are nine transactions in this database, that is, |D|=9. We use Figure 6.2 to illustrate the Apriori algorithm for nding frequent itemsets in D. 1. In the rst iteration of the algorithm, each item is a member of the set of candidate 1-itemsets, C1. The algorithm simply scans all of the transactions to count the number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup=2. (Here, we are referring to absolute support because we are using a support count.The corresponding relative support is 2/9=22%.)The set of frequent1-item sets,L1,can then be determined. It consists of the candidate 1-itemsets satisfying minimum support. In our example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 1 L1 to generate a candidate set of 2-itemsets, C2.7 C2 consists of|L1| 22-itemsets. Note that no candidates are removed from C2 during the prune step because each subset of the candidates is also frequent. Table 6.1 Transactional Data for an AllElectronics Branch TID List of item IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 11, 12, 13
Data Mining Page 250
Explanation / Answer
Apriori algorithm is used to find frequent item sets. This is done by generating candidate item sets by testing a few specifications. This can be explained with the help of the given example
TID
List of items_IDs
T100
I1,I2,I5
T200
I2,I4
T300
I2,I4
T400
I1,I2,I4
T500
I1,I3
T600
I2,I3
T700
I1,I3
T800
I1,I2,I3,I5
T900
I1,I2,I3
This is the list of transactions with the column name TID (i.e Transaction ID) and the corresponding list of item sets (i.e I1,I2…etc) for each transaction.
The given minimum support count is 2 (i.e min_sup=2) this means that the total number of times a particular item set repeats itself must be atleast 2. The table below gives the frequency of each item set
Item set
Frequency
{I1}
6
{I2}
7
{I3}
6
{I4}
2
{I5}
2
[A single row in the above table means that itemset {I1} is seen 6 times in the given table]
The above formed table is known is candidate item set 1 (i.e C1).
Now let us form the candidate item set C2 by taking the combinations of item sets in C1 (having 2 item sets as one single set) and lets also find the support count of each such set (i.e frequency that each set appears)
Item set
Support count
{I1,I2}
4
{I1,I3}
4
{I1,I4}
1
{I1,I5}
2
{I2,I3}
4
{I2,I4}
2
{I2,I5}
2
{I3,I4}
0
{I3,I5}
1
{I4,I5}
0
[ A single row in the above table means that item set I1 along with I2 is seen 4 times in the given table]
This forms the candidate item set C2.
But in the given question the minimum support count (min_sup) = 2.
Therefore in the above table we delete the item sets which have support count less than 2.
Hence the table is modified as follows
Item Set
Support count
{I1,I2}
4
{I1,I3}
4
{I1,I5}
2
{I2,I3}
4
{I2,I4}
2
{I2,I5}
2
Now let us generate the candidate item set C3 (by taking 3 item sets from C2 as a single item set) .
This is formed as follows
Item set
Support count
{I1,I2,I3}
2
{I1,I2,I5}
2
Now candidate item set C4 is formed by combining 4 item sets from C3 as a single item set.
But in the given table there is only one transaction with 4 item sets and this does not meet the minimum support count required. So C4 cannot be formed and is considered as empty and the Apriori algorithm ends here giving {I1,I2,I3} and {I1,I2,I5} as the required frequent item sets.
TID
List of items_IDs
T100
I1,I2,I5
T200
I2,I4
T300
I2,I4
T400
I1,I2,I4
T500
I1,I3
T600
I2,I3
T700
I1,I3
T800
I1,I2,I3,I5
T900
I1,I2,I3
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.