Frequent Pattern And Market Basket Implementation

Khawar Islam
8y
13.3k
0
1

Article

Frequent Pattern / Market Basket Analysis

Frequent pattern mining is about the item sets and sequences which appear in a dataset. For example, a set of items consists of shoes, trousers, and belts together in the dataset. All super markets have their own selling threshold like some super market decides their minimum threshold is 80% and some decide that their minimum threshold is 90 percent.

Question

We have a list of items with transaction IDS in our supermarket, what is the threshold? If we are selling trousers with shirts the minimum threshold is 80 percent. The transaction list is given in the below table.

Minimum Support = 40%

Minimum Confidence = 65%

Solution (Trousers -> shirt)

Now first we draw the table like question 1 and create binary table.

Now we calculate Support

Support = Combine numbers of trousers & shirts / Overall transaction IDs

= 2/4

= 0.5

Support (trousers -> shirt) = 50%

Now we calculate Confidence

A = trousers

B = shirts

Confidence = P(AUB)/P(A) = Combine numbers of trousers & shirts/ Number of trousers occurrence

= 2/3

= 0.66

Confidence (trousers -> shirt) = 66%

Apriori Algorithm

Apriori algorithm is mining algorithms used for frequent item sets, where item sets are extended using candidate generation which is tested against the data.

Question

The table consists of transaction IDs and items. Find out the list of items whose minimum support is greater than 2.

SOLUTION

First we find support for each item.

1^st Level Candidate

Construct the table in which unique number of items are listed down in the left side first column, and write the numbers of A present from Items TID 10 to 40, we see that A comes 2 times in four rows so we write 2 in support column. If B comes three times in item list, we write 3 in our support column. This is our first level candidate.

We cut “D” from item set because it supports 1, we need minimum support =2

After removing D from table remaining item set in list is

Second Level Candidate

Now we make possible sets of item sets. Multiple A item with all items like {A} multiple with {B}, {C} and {E} then multiple {B} with {C}, and {E} then multiple {C} with {E}.

Move to the table which is given in the question and see how many times {A, B} occurs in combination then write it in support column below. Follow the same steps for all item sets.

Similarly we cut those sets whose support = 1

Remaining Item Sets

Result

Now we see the item set whose support is the same.

1^ST Level Candidate {A}, {B}, {C}, {D}

2^ND Level Candidate {A, C}, {B, C}, {B, E}, {C, E}

Those items are frequent.