Frequent Pattern And Market Basket Implementation

Frequent Pattern / Market Basket Analysis

Frequent pattern mining is about the item sets and sequences which appear in a dataset. For example, a set of items consists of shoes, trousers, and belts together in the dataset. All super markets have their own selling threshold like some super market decides their minimum threshold is 80% and some decide that their minimum threshold is 90 percent.

Question

We have a list of items with transaction IDS in our supermarket, what is the threshold? If we are selling trousers with shirts the minimum threshold is 80 percent. The transaction list is given in the below table.

Minimum Support = 40%

Minimum Confidence = 65%

Market Basket Analysis

Solution (Trousers -> shirt)

Now first we draw the table like question 1 and create binary table.

Solution: (Trousers - /> shirt)

Now we calculate Support

Support = Combine numbers of trousers & shirts / Overall transaction IDs

= 2/4

= 0.5

Support (trousers -> shirt) = 50%

Now we calculate Confidence

A = trousers

B = shirts

Confidence = P(AUB)/P(A) = Combine numbers of trousers & shirts/ Number of trousers occurrence 

= 2/3

= 0.66

Confidence (trousers -> shirt) = 66%

Apriori Algorithm

Apriori algorithm is mining algorithms used for frequent item sets, where item sets are extended using candidate generation which is tested against the data.

Question

The table consists of transaction IDs and items. Find out the list of items whose minimum support is greater than 2.


SOLUTION

First we find support for each item.

1st Level Candidate

Construct the table in which unique number of items are listed down in the left side first column, and write the numbers of A present from Items TID 10 to 40, we see that A comes 2 times in four rows so we write 2 in support column. If B comes three times in item list, we write 3 in our support column. This is our first level candidate.

1st Level Candidate

We cut “D” from item set because it supports 1, we need minimum support =2

After removing D from table remaining item set in list is

1st Level Candidate

Second Level Candidate

Now we make possible sets of item sets. Multiple A item with all items like {A} multiple with {B}, {C} and {E} then multiple {B} with {C}, and {E} then multiple {C} with {E}.

Move to the table which is given in the question and see how many times {A, B} occurs in combination then write  it in support column below. Follow the same steps for all item sets.

2nd Level Candidate

Similarly we cut those sets whose support = 1

Remaining Item Sets

Remaining Item Sets

Result

Now we see the item set whose support is the same.

1ST Level Candidate {A}, {B}, {C}, {D}

2ND Level Candidate {A, C}, {B, C}, {B, E}, {C, E} 

OR

Those items are frequent.