Unsupervised Learning Algorithms Part-1
Unsupervised
Learning
Supervised algorithms have been hugely
successful in solving a large number of machine learning problems. Unlike
supervised algorithms which learn from data having known targets and labels,
unsupervised algorithms learn with data without known targets or labels.
Figure-1:
Unsupervised Algorithms learns without known targets in data
These algorithms discover patterns in data and along
with reinforcement learning could play a lead role towards achieving general
artificial intelligence which is the ability to learn any task a human can
learn. Children learn many a tasks in unsupervised manner.
Figure-2:
A child learns to identify similar patterns
Figure-3:
Children learn while they play
Supervised learning system can learn only those
tasks, that it’s trained for with known targets. It is expensive since large
amount of data needs to be identified and labelled before training. Unsupervised
methods however learn autonomously like a child learns to identify different
breeds of the same animal without seeing all of them.
Learned cluster
Figure-4:
The main goal of such unsupervised
learning problems is to learn a function that describes the unknown
structure hidden in the data. One of the main goals of these algorithms is to learn
by grouping similar samples within
the data called clustering. The other goals are to determine the
distribution of data within the multidimensional data space, known as density
estimation. Another use is to project the data from a high-dimensional
space down to two or three dimensions for the purpose of visualization.
In deep neural networks layer-wise pre-training
method by unsupervised learning
eliminates the issues related to weight updates in the earlier layers. Pre-training the network by unsupervised method and then further training it in a supervised manner improves speed of convergence and model accuracy. Major tasks that can be performed by unsupervised learning are briefly discussed below.
Various tasks performed by Unsupervised learning
Clustering
Clustering attempts to discover the
structure of the data and group them based on similarity of features. Several
types of clustering algorithms exist based on K-means, Fuzzy C-means, Hierarchical, Density, Parametric modelling methods.
Figure-5:
Food Clusters
Density
Estimation
Density
estimators determine the parameters of probability distribution from which data
is generated. Expectation maximization (EM) is one of the most popular
techniques used to determine the parameters of a mixture of exponential
distributions.
The method of moments uses the moments
of the observed data sample (mean, variance, skewness, and kurtosis) to estimate
population parameters. The method of moments is an easy way to solve mixture modelling
problems.
Figure-6:
Density Estimation
Curse
of Dimensionality, Latent Variable models and Dimensionality Reduction
The complexity of learning model
scales up as the dimensionality of data also increases data exploration and
visualization gets harder. The size of the dataset required to perform learning
increases exponentially with data dimensionality. These problems associated with large dimensional data is called the
curse of dimensionality.
Data dimensionality can be decreased using latent variable models. Latent variable models are statistical
models that relate a set of observable variables to a set of latent (hidden)
variables. Latent variable models reveal hidden structures in complex and
high-dimensional data. Principal Component Analysis (PCA), Singular Value
Decomposition, Self organizing maps etc belong to this category. Dimensionality
reduction helps to implement lesser complex learning models and speeds up implementation of learning
algorithm.
Figure-7:
Dimensionality Reduction
Principal
Component Analysis (PCA)
PCA is an
unsupervised method that projects the data onto its principal component axes
which are independent axes directions along which data variance are maximized.
In an n-dimensional data space there are n principal component axes. The first principal component axis PC1 is the axis along which data has the highest variance, the
second one PC2 the next highest and so on until the nth axis PCn. PCA is a special case of
Singular Value Decomposition (SVD) method.
Figure-8:
Self
Organizing Maps
SOMs constitute
a neural network approach to organize input patterns into clusters. The method performs data clustering and dimensionality reduction.
SOMs are also called topology preserving maps as found in human brain. Topology
of the network refers to the structural architecture of network.
Figure-9:
A SOM has been
used to classify statistical data describing various quality-of-life factors
such as state of health, nutrition, educational services etc. Countries with
similar quality-of-life factors end up clustered together.
Figure-10:
A Self
Organizing Maps learns to perform Vector quantization mapping and continuous
space to discrete space.
Market
Basket Analysis (MBA)
MBA is an
unsupervised data mining method that discovers associations and correlations
among items in large relational and transactional method. It is a mathematical
modeling method based on association rules. It estimates the likelihood of
finding a data or a group of data items appearing frequently together with another data
or a group of data. Hence it is extensively used to mine out interesting patterns
of association in transactions involving large amounts of data. It is useful in
business decision making and to increase profitability.
Figure-11:
MBA for determining frequent item set
Collaborative
filtering (CF)
Collaborative filtering is another unsupervised data
mining method that filters information about preferences of people with similar
interests from available data. This method is used in various recommender systems to predict customer tastes by business corporations such as Amazon, Netflix
and many marketing agencies.
Figure-12:
Collaborative Filtering for Recommender systems
References
- Simon Haykin, Neural Networks, Pearsons education asia (2001), II edition.
- Kevin P. Murphy, Machine learning: a probabilistic perspective, MIT Press, 2012.
- Wikipedia.org.
- https://www.authorea.com/users/89740/articles/110223-unsupervised-learning-clustering-and-density-estimation
Figure Credits
- Figure-2: guru99.com.
- Figure-3: exclusive.multibriefs.com.
- Figure-5: gitcdn.xyz.
- Figure-6: python-data-science.readthedocs.io.
- Figure-7: encrypted-tbn0.gstatic.com.
- Figure-8: kindsonthegenius.com.
- Figure-9: nationranking.files.wordpress.com.
- Figure-10: d3i71xaburhd42.cloudfront.net.
- Figure-11: mathworks.com.
- Figure-12: cdn-images-1.medium.com
Comments
Post a Comment