Market Basket Analysis, Collaborative Filtering- Unsupervised Algorithm - Part 4


Market Basket Analysis (MBA) and Collaborative Filtering (CF)


Market Basket Analysis
Market Basket Analysis is one of the most common and useful types of data mining and analysis method. It uses association rule mining to identify products frequently bought together. The underlying objective of association rule mining is to find associations between different items in a set. In the case of market basket analysis, these items are the products purchased by a customer and the set is a transaction. In short, market basket analysis
  • Is an unsupervised data mining technique
  • Uncovers products which are frequently bought together
  • Creates if-then scenario rules
Figure-1:
Various use cases of MBA

It finds applications in diverse areas viz. Manufacturing, Telecom, Insurance, Retail Marketing, Medical industries, Banking, Education, Restaurants etc. These organizations use association rule data analysis method to find information on how to bundle their products, to increase sales profit; to design - catalogues, menu layouts, store displays; plan inventory control; promotional/advertisement campaigns; determine timings of product groups and so on.

When you go to e-marketing sites like Amazon and browse, you will find similar and associated products grouped together. Marketing firms also find associations between different products and customers. They know that which product advertisements are effective to promote their sales. MBA can be used to analyze which age, region and income group and will buy which set of items. In short MBA is useful to anticipate customer behaviour.

Besides increasing sales profits, association rules can also be used in other fields. In medical diagnosis for instance, understanding which symptoms tend to comorbid (in medicine means same as co-occurrence) can help to improve patient care and medicine prescription. By studying the co-occurrence of side effects in patient with multiple prescriptions, a hospital can find previously unknown drug interactions and use this information to warn patients.

Finding Association Rules
This is the process of finding associations between different items in the available data. It discovers the probability of the co-occurrence of items in a collection, such as people who buy item X also tend to buy Y. The following figure illustrates the concept


Figure-2:
Association rule mining

 

Other Application Areas
Although market basket analysis generally conjures up pictures of shopping carts and supermarket shoppers, it is important to realize that there are many other areas in which it can be applied. These include:


  • Analysis of credit card purchases.
  • Analysis of telephone calling patterns.
  • Identification of fraudulent medical insurance claims. (Consider cases where common rules are broken).
  • Analysis of telecom service purchases.
  • Etc.

As mentioned earlier MBA used association rule learning algorithms. Most common algorithms are
1.      Apriori
2.      Eclat

The Problem

When we go grocery shopping, we often have a standard list of things to buy. Each shopper
has a distinctive list, depending on one’s needs and preferences.

A housewife might buy healthy ingredients for a family dinner.
A bachelor might buy beer and chips.


Figure-3:
Frequent Itemsets

Understanding these buying patterns can help to increase sales in several ways.

If there is a pair of items, X and Y, that are frequently bought together: 
  • Both X and Y can be placed on the same shelf, so that buyers of one item would be prompted to buy the other. 
  • Promotional discounts could be applied to just one out of the two items. 
  • Advertisements on X could be targeted at buyers who purchase Y
  • X and Y could be combined into a new product, such as having toothpaste with toothbrush.


While we may know that certain items are frequently bought together, the question is, how do we uncover these associations from a large new dataset on which no analysis has been done?

In order to answer such questions, we need to learn some definitions.

Definitions
1.      Itemset: Collection of one or more items, k-item-set means a set of k items.

Figure-4:
Itemsets for different values of k.

2.      Support Count: Frequency of occurrence of an item-set 'X '.
3.      Support (X): Fraction of transactions that contain the item-set X
Support(X) = frequency(X) /N
where N is the total number of items sold.

Association rules analysis


Is a technique to uncover how items are associated to each other? There are three common ways to measure association.

Measure 1: Support. This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. The range of this measure is [0,1]. 

Support of the item X is nothing but the ratio of the number of transactions in which the item X appears to the total number of transactions.

Consider the example of a group of items Apple, Beer, Chicken, Sugar, Milk and Pear

Example

 

Figure-5:
Example Transactions

In Figure-5 above the support of {apple} is 4 out of 8, or 50%.


Frequent item-set mining 

This is data mining that focuses on set of items or sequences of actions, events (for example it can even be the order in which we get dressed). In a transaction it is a set of items that appears in many baskets is said to be “frequent” which is a  collection of one or more items.
Pairs of items purchased together may provide give insight to customer purchase behaviour. Frequent item-sets can contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%. When there are more than two items in an itemset the analysis is called multidimensional MBA.


Support Threshold

If the sales of items beyond a certain proportion tend to have a significant impact on business profits, a proportion called support threshold can be considered. This value (which ranges from 0 to 1) is used to identify item-sets with support values above this threshold. This is called significant itemset. A support metric is defined for itemsets, not association rules.

Figure-6:
Setting a minimum threshold support

Minimum support threshold: 

Item-sets whose support is greater or equal than minimum support threshold (min_sup). This is set on user choice. As an example min-sup can be value = 0.3.

As an exercise find the item set that has minimum support of 0.3

TID
Items_Count
T1
M,O,N,K,E,Y
T2
D,O,N,K,E,Y
T3
M,A,K,E
T4
M,U,C,K,Y
T5
C,O,O,K,I,E

Measure 2: Confidence.


This is a measure of how likely item Y is purchased when item X is purchased, expressed as {X => Y}. The range of this measure is [0,1].
This is measured by the proportion of transactions with item A, in which item B also appears.
In Figure-5:, the confidence of {apple => beer} is 3 out of 4, or 75%.


It is the ratio of how many times apple (A) , and beer (B) were purchased together to number of apple appeared in the total number of transaction. This method takes into account the popularity of the item A.

 

Support and Confidence

Consider two sets of items A and B. It is assumed that A and B are disjoint sets. Then support and confidence measure between A and B is expressed

A => B [Support, Confidence]

The part before => is referred to as if (Antecedent) and the part after => is referred to as then (Consequent).  The confidence of a rule is the probability of observing a consequent in a transaction given it contains the antecent. This metric is symmetric or directed.

Figure-7:
Antecedent and Consequent

Where A and B are sets of items in the transaction data. Consider a purchase of computer and anti-virus software which is expressed as  

Computer =>Anti−virus Software [Support = 20%, confidence = 60%]

Above rule says:
  1. 20% of total transactions show contains the item set {computers, anti-virus software}.
  2. 60% percentage of transactions containing computers, also contain anti-virus software.

Support and Confidence measure how significant the association rule is. It is set by the minimum support and minimum confidence thresholds. These thresholds help to compare the rule strength according to user’s choice.

Strong rules: If a rule A=>B [Support, Confidence] satisfies min_sup and min_confidence then it is a strong rule.

Drawback of confidence measure

One drawback of the confidence measure is that it might misrepresent the importance of an association. This is because it only accounts for how popular apples are, but not beers. If beers are also very popular in general, there will be a higher chance that a transaction containing apples will also contain beers, thus inflating the confidence measure. To account for the base popularity of both constituent items, we use a third measure called lift.

Measure 3: Lift.


Lift: Lift gives the correlation between A and B in the rule A=>B. Correlation shows how one item-set A effects the item-set BThe range is [0,∞).

Lift(A=>B)=Support(A,B)/Supp(A)Supp(B)

This is similar to confidence measure. But unlike confidence score this method takes into account the popularity of the item B. Confidence score between A and B may reflect an inflated value on   the likelihood of B being purchased with A, if both A and B are independently popular. With a value greater than one it indicates how more often the antecedent and consequent of the rule A=>B occur together than if they were statistically independent.

In other word it says how likely item B is purchased when item A is purchased, while controlling for how popular item B is. If lift score of A=>B  equals 1 then A and B are independent. 

In Figure-5: the lift of {apple -> beer} is 1, which implies no association between items. A lift value greater than 1 means that item B is likely to be bought if item A is bought, while a value less than 1 means that item B is unlikely to be bought if item A is bought.


  • Lift (A => B) = 1 means that there is no correlation within the itemset. 
  • Lift (A =>B) > 1 means that there is a positive correlation within the itemset, i.e., products in the itemset, x and y, are more likely to be bought together. 
  • Lift (A => B) < 1 means that there is a negative correlation within the itemset, i.e., products in itemset, A and B, are unlikely to be bought together.


Association Rule Mining is viewed as a two-step approach:
  • Frequent Itemset Generation: Find all frequent item-sets with support >= pre-determined min_support count. 
  • Rule Generation:  List all Association Rules from frequent item-sets. Calculate Support and Confidence for all rules. Prune rules that fail min_support and min_confidence thresholds.

Summary of Support, Confidence and Lift
The following figure summarizes the concepts of Support, Confidence  and Lift for MBA.

Figure-8:
Summary of Support, Confidence and Lift measures


Measure 4: Leverage


Leverage: The leverage of A => B computes the difference between the frequency of A and B appearing together and the frequency that would be expected if  A and B were independent. A leverage value 0 indicates that they are independent. This measure is computed as 


The range of this measure is [-1,1].

Measure 5: Conviction


Conviction: The conviction of rule A => B computes how the consequent is dependent on the antecedent. A high conviction means that the consequent is highly dependent on the antecedent. 


If the confidence measure of rule A => B equals 1, this means perfect confidence score, then conviction measure becomes infinite. Similar to lift measure if  A and B are independent the conviction measure of rule A => B equals 1. The range is [0,∞).

What is Apriori Algorithm?


Apriori is an algorithm designed for frequent item set mining and association rule learning over relational databases. It proceeds by identifying associations and commonalities between component parts of rows of data called transactions. As an example for e-commerce/retail shops with large databases the algorithm can find groups of individual items that are frequently purchased together.
The algorithm was proposed by Rakesh Agarwal and Ramakrishnan Srikant in 2005. The apriori principle can reduce the number of itemsets we need to examine. Put simply, the apriori principle states that

if an itemset is infrequent, then all its supersets must also be infrequent

This means that if {beer} was found to be infrequent, we can expect {beer, pizza} to be equally or even more infrequent. So in consolidating the list of popular itemsets, we need not consider {beer, pizza}, nor any other itemset configuration that contains beer.

 

Finding itemsets with high support

Using the apriori principle, the number of itemsets that have to be examined can be pruned, and the list of popular itemsets can be obtained in these steps:

Step 0. Start with itemsets containing just a single item, such as {apple} and {pear}.

Step 1. Determine the support for itemsets. Keep the itemsets that meet your minimum support threshold, and remove itemsets that do not.

Step 2. Using the itemsets you have kept from Step 1, generate all the possible itemset configurations.

Step 3. Repeat Steps 1 & 2 until there are no more new itemsets

Reducing candidate itemsets using the Apriori Algorithm
As seen in the animation, {apple} was determine to have low support, hence it was removed and all other itemset configurations that contain apple need not be considered. This reduced the number of itemsets to consider by more than half.
Note that the support threshold that you pick in Step 1 could be based on formal analysis or past experience. If you discover that sales of items beyond a certain proportion tend to have a significant impact on your profits, you might consider using that proportion as your support threshold.

Finding item rules with high confidence or lift

We have seen how the apriori algorithm can be used to identify itemsets with high support. The same principle can also be used to identify item associations with high confidence or lift. Finding rules with high confidence or lift is less computationally taxing once high-support itemsets have been identified, because confidence and lift values are calculated using support values.
Take for example the task of finding high-confidence rules. If the rule

{beer, chips -> apple}

has low confidence, all other rules with the same constituent items and with apple on the right hand side would have low confidence too. Specifically, the rules

{beer -> apple, chips}
{chips -> apple, beer}

would have low confidence as well. As before, lower level candidate item rules can be pruned using the apriori algorithm, so that fewer candidate rules need to be examined.

Limitations of Apriori

Computationally Expensive. Even though the apriori algorithm reduces the number of candidate itemsets to consider, this number could still be huge when store inventories are large or when the support threshold is low. However, an alternative solution would be to reduce the number of comparisons by using advanced data structures, such as hash tables, to sort candidate itemsets more efficiently.
·          
Spurious Associations. Analysis of large inventories would involve more itemset configurations, and the support threshold might have to be lowered to detect certain associations. However, lowering the support threshold might also increase the number of spurious associations detected. To ensure that identified associations are generalizable, they could first be distilled from a training dataset, before having their support and confidence assessed in a separate test dataset.

Missing Values and Data noise which contaminate the data can affect the accuracy of market basket analysis.

Eclat Algorithm
The ECLAT algorithm stands for Equivalence Class Clustering and bottom-up Lattice Traversal. It is one of the popular methods of Association Rule mining. It is a more efficient and scalable version of the Apriori algorithm. While the Apriori algorithm works in a horizontal sense imitating the Breadth-First Search of a graph, the ECLAT algorithm works in a vertical manner just like the Depth-First Search of a graph. This vertical approach of the ECLAT algorithm makes it a faster algorithm than the Apriori algorithm.

Apriori are use large dataset and eclat are small and medium datase.
Apriori are scan original(real) dataset Eclat scan currently genereted dataset.
Apriori are slower then Eclat.

Advantages over Apriori algorithm:- 
  • Memory Requirements: Since the ECLAT algorithm uses a Depth-First Search approach, it uses less memory than Apriori algorithm. 
  • Speed: The ECLAT algorithm is typically faster than the Apriori algorithm. 
  • Number of Computations: The ECLAT algorithm does not involve the repeated scanning of the data to compute the individual support values.

Collaborative filtering (CF) and Recommender Systems

Collaborative Filtering is an unsupervised data mining method to filter out items using the opinions of other people. This type of analysis is used for recommender systems. As examples products, books, news items, movies, web pages, resorts, restaurants etc that are likely of interest to users are evaluated based on similar user patterns (for more related figures refer to earlier blogs on Paradigms of Learning Algorithms and Unsupervised Learning part-1).  This algorithmic technique has its roots on the age old human behaviour of opinion sharing between people. This is active collaborative filtering with human participation.

Figure-9:

A limitation of active collaborative filtering systems is that they require a community of people who know each other.
As computers evolved individual opinions could be stored in databases. Opinions could be accessed by appropriate queries. This resulted in the development of automated collaborative and recommender systems. During the developmental history of collaborative two models evolved 1) Pull-active systems in which it is required that the user know whose opinions to trust 2) Push-active systems that required the user know to which particular content may be interesting. Automated collaborative filtering (ACF) systems relieve users of this burden by using a database of historical user opinions to automatically match each individual to others with similar opinions.

The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of another randomly chosen person. For example, a collaborative filtering recommendation system for television tastes could make predictions about which television show a user should like, given a partial list of that user's tastes (likes or dislikes).  Note that these predictions are specific to the user, but use information gleaned from many users.

Item Based or item-item collaborative filtering
This is a model-based algorithm for making recommendations. The method takes similarities between items’ consumption histories. For item based collaborative filtering user information is not important. The similarity values between items are measured by observing all the users who have rated items in a dataset. Similarity between two items is dependent upon the ratings given to the items by users who have rated both of them. Similarities scores are calculated by using one of a number of similarity measures. Common similarity measures are cosine similarity (vector based) measure, Pearson (Correlation) - based similarity measure, adjusted cosine similarity (adjusted based on the fact the different users will have different similarity ratings). These similarity values are used to predict ratings for user-item pairs. Once we make a model using one of the similarity measures described above, we can predict the rating for any user-item pair by using the idea of weighted sum.
 
In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets. 

Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc.



Figure-10:
Collaborative Filtering for items of interest.

User Based: 

This method considers similarities between user consumption histories and item similarities. Since we have item based similarity matrix we can check which are the items a user has consumed.  For each item the top K neighbours are determined. The consumption record of each neighbour is recorded. A similarity score is calculated. Items with highest scores are recommended.

Figure-11:

Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc.

References:

  1. Jiawei Han, Micheline Kamber, Jian Pei, “Data  Mining Concepts and Techniques”, Morgan Kaufmann, 3rd Ed., 
  2. George M. Marakas, “Modern Data Warehousing, Mining, and Visualization: Core Concepts”, Pearson, 2003, 
  3. Ben Schafer, Dan Frankowski, Shilad Sen, “Collaborative Filtering Systems”, researchgate.com, 
  4. "Practical Introduction to Market Basket Analysis - Association Rules”, rsquaredacademy.com
  5. http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

Figure Credits:

Figure-1: rsquaredacademy.com
Figure-2: techleer.com
Figure-3: sciencedirect.com
Figure-4: saedsayad.com
Figure-5: KDnuggets.com
Figure-6: datacamp.com
Figure-7: i.stack.imgur.com
Figure-8: saedsayad.com
Figure-9: semanticscholar.org
Figure-10: ieee.org

Figure-11:  salemmarafi.com


Comments

Popular posts from this blog

Artificial Intelligence and Machine Learning Life Cycle

Modeling Threshold Logic Neural Networks: McCulloch-Pitts model and Rosenblatt’s Perceptrons

Regularization and Generalization in Deep Learning