Unsupervised Learning Algorithms Part-1

Unsupervised Learning

Supervised algorithms have been hugely successful in solving a large number of machine learning problems. Unlike supervised algorithms which learn from data having known targets and labels, unsupervised algorithms learn with data without known targets or labels.

 
                      Figure-1:
                      Unsupervised Algorithms learns without known targets in data

These algorithms discover patterns in data and along with reinforcement learning could play a lead role towards achieving general artificial intelligence which is the ability to learn any task a human can learn. Children learn many a tasks in unsupervised manner.



Figure-2:
A child learns to identify similar patterns


Figure-3:
Children learn while they play

Supervised learning system can learn only those tasks, that it’s trained for with known targets. It is expensive since large amount of data needs to be identified and labelled before training. Unsupervised methods however learn autonomously like a child learns to identify different breeds of the same animal without seeing all of them.


Learned cluster
Figure-4:

The main goal of such unsupervised learning problems is to learn a function that describes the unknown structure hidden in the data. One of the main goals of these algorithms is to learn by grouping similar samples within the data called clustering. The other goals are to determine the distribution of data within the multidimensional data space, known as density estimation. Another use is to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

In deep neural networks layer-wise pre-training method by unsupervised  learning eliminates the issues related to weight updates in the earlier layers. Pre-training the  network by unsupervised method and then further training it in a supervised  manner improves speed of convergence and model accuracy. Major tasks that can be performed by unsupervised learning are briefly discussed below.

Various tasks performed by Unsupervised learning

Clustering
Clustering attempts to discover the structure of the data and group them based on similarity of features. Several types of clustering algorithms exist based on K-means, Fuzzy C-means, Hierarchical, Density, Parametric modelling methods.


Figure-5:
Food Clusters


Density Estimation
Density estimators determine the parameters of probability distribution from which data is generated. Expectation maximization (EM) is one of the most popular techniques used to determine the parameters of a mixture of exponential distributions.
The method of moments uses the moments of the observed data sample (mean, variance, skewness, and kurtosis) to estimate population parameters. The method of moments is an easy way to solve mixture modelling problems.
Figure-6:
Density Estimation

Curse of Dimensionality, Latent Variable models and Dimensionality Reduction

The complexity of learning model scales up as the dimensionality of data also increases data exploration and visualization gets harder. The size of the dataset required to perform learning increases exponentially with data dimensionality. These problems associated with large dimensional data  is called the curse of dimensionality.   
Data dimensionality can be decreased using latent variable  models. Latent variable models are statistical models that relate a set of observable variables to a set of latent (hidden) variables. Latent variable models reveal hidden structures in complex and high-dimensional data. Principal Component Analysis (PCA), Singular Value Decomposition, Self organizing maps etc belong to this category. Dimensionality reduction helps to implement lesser complex learning models and speeds up implementation of learning algorithm.
Figure-7:
Dimensionality Reduction

Principal Component Analysis (PCA)
PCA is an unsupervised method that projects the data onto its principal component axes which are independent axes directions along which data variance are maximized. In an n-dimensional data space there are n principal component axes. The first principal component axis PC1     is the axis along which data has the highest variance, the second one PC2 the next highest and so on until the nth axis PCn. PCA is a special case of Singular Value Decomposition (SVD) method.

Figure-8:

Self Organizing Maps
SOMs constitute a neural network approach to organize input patterns into clusters. The method performs  data clustering and dimensionality reduction. SOMs are also called topology preserving maps as found in human brain. Topology of the network refers to the structural architecture of network.

Figure-9:

A SOM has been used to classify statistical data describing various quality-of-life factors such as state of health, nutrition, educational services etc. Countries with similar quality-of-life factors end up clustered together.



Figure-10:
A Self Organizing Maps learns to perform Vector quantization mapping and continuous space to discrete space.


Market Basket Analysis (MBA)
MBA is an unsupervised data mining method that discovers associations and correlations among items in large relational and transactional method. It is a mathematical modeling method based on association rules. It estimates the likelihood of finding a data or a group of data items appearing frequently together with another data or a group of data. Hence it is extensively used  to mine out interesting patterns of association in transactions involving large amounts of data. It is useful in business decision making and to increase profitability.

Figure-11:
MBA for determining frequent item set


Collaborative filtering (CF)
Collaborative filtering is another unsupervised data mining method that filters information about preferences of people with similar interests from available data. This method is used in various recommender systems  to predict customer tastes by business corporations such as Amazon, Netflix and many marketing agencies.

Figure-12:
Collaborative Filtering for Recommender systems

References
  1. Simon Haykin, Neural Networks, Pearsons education asia (2001), II edition. 
  2. Kevin P. Murphy, Machine learning: a probabilistic perspective, MIT Press, 2012. 
  3. Wikipedia.org. 
  4. https://www.authorea.com/users/89740/articles/110223-unsupervised-learning-clustering-and-density-estimation

Figure Credits 
  1. Figure-2: guru99.com. 
  2. Figure-3: exclusive.multibriefs.com. 
  3. Figure-5: gitcdn.xyz. 
  4. Figure-6: python-data-science.readthedocs.io. 
  5. Figure-7: encrypted-tbn0.gstatic.com. 
  6. Figure-8: kindsonthegenius.com. 
  7. Figure-9: nationranking.files.wordpress.com. 
  8. Figure-10: d3i71xaburhd42.cloudfront.net. 
  9. Figure-11: mathworks.com. 
  10. Figure-12: cdn-images-1.medium.com





Comments

Popular posts from this blog

Artificial Intelligence and Machine Learning Life Cycle

Modeling Threshold Logic Neural Networks: McCulloch-Pitts model and Rosenblatt’s Perceptrons

Regularization and Generalization in Deep Learning