Paradigms of Learning Algorithms for AI and ML


Supervised, Unsupervised, and Reinforcement Learning methods

The major types or the canonical categories of machine learning paradigms are Supervised learning, Unsupervised learning and Reinforcement learning. These categories may not always be mutually exclusive of each other. These paradigms differ in the tasks they can solve and in how the data is presented to the learning model.

Figure-1
Canonical Learning Paradigms


Figure-2
Autonomous Driving

Supervised Learning
These algorithms are trained using labeled data samples, in which the desired outcome for input data is already known. Supervised Learning is currently the most popular machine learning method. This paradigm is also called learning with a teacher. This method involves learning a function that maps an input to an output based on known samples of input-output pairs. It infers a function from the training data consisting of a set of training examples.


The goal is to learn a mapping from the space X of inputs to a space of outputs Y, given a training set of sample pairs

There are several applications for Machine Learning (ML), the most significant of which is data mining. People are often prone to making mistakes during analyses or, possibly, when trying to establish relationships between multiple features. This makes it difficult for them to find solutions to certain problems. Machine learning can often be successfully applied to these problems, improving the efficiency of systems and the designs of machines.

Major steps of learning algorithms
Inductive machine learning is the process of learning a set of rules from instances (examples in a training set), or more generally speaking, creating a classifier that can be used to generalize from new instances. The process of applying supervised ML to a real-world problem is described in Figure 3.

Figure-3
Supervised Learning procedure

In supervised learning the training data is represented as input-output examples. The method is sometimes also called predictive learning. The input si’s are generally feature vectors with dimensionality I. Then the set of data vectors can be represented as I x N matrix.   The input feature components (variables) are called attributes or covariates. The output yi’s may be a set of nominal or categorical variables, labels or real valued. The output yi’s can be also be scalars or vectors. For convenience in our context, we consider it a scalar. Among the several application areas of supervised in Machine Learning (ML), data mining is the the most significant one.
Figure-4
Learning with a teacher

Supervised techniques are appropriate when you have a specific target value you’d like to predict about your data. Supervised learning can be used for classification and regression applications. The targets can have two or more possible outcomes, for classification applications. For regression applications the targets are continuous numeric values.

To use these methods, a subset of data points for which this target value is already known must be available. The data subset is used to build a model of what a typical input data point is mapped to one of the various target values. The model is then used to predict the output corresponding to an input for which that target value is currently unknown. The algorithm identifies the “new” data points that match the model of each target value.

In Data mining supervised learning is considered a predictive or directed method and unsupervised learning is considered descriptive or undirected technique. Both categories encompass functions capable of finding different hidden patterns in large data sets.

Classification
Example binary classification
When there are only two categories it is binary classification

Figure-5
Binary Classification

Figure shows two images, each having a distinct region inside it. The two regions are visually different. The region of Figure (a) results from a benign lesion, class A, and that of Figure (b) from a malignant one (cancer), class B. The mean and standard deviation can be used as features for the above types of images. These features can then be used for binary classification.

Credit card defaulter detection
Imagine you are a credit card company, and you want to know which customers are likely to default on their payments in the next few years.
You use the data on customers who have and have not defaulted for extended periods of time as build data (or training data) to generate a classification model. You then run that model on the customers you are curious about. The algorithms will look for customers whose attributes match the attribute patterns of previous defaulters/non-defaulters and categorize them according to which group they most closely match. You can then use these groupings as indicators of which customers are most likely to default.

Prediction of customer choice
Similarly, a classification model can have more than two possible values in the target attributes. The values could be anything from the shirt colours they’re most likely to buy, the promotional methods they’ll respond to (mail, email, phone), or whether or not they’ll use a coupon.

Regression
Regression is similar to classification except that the targeted attributes are real valued numbers, rather than categorical. The order or magnitude of the value is significant in some way.
To reuse the credit card example, if you wanted to know what threshold of debt, new customers are likely to accumulate on their credit card, you would use a regression model.
Simply supply data from current and past customers with their maximum previous debt level as the target value, and a regression model will be built on that training data. Once run on the new customers, the regression model will match attribute values with predicted maximum debt levels and assign the predictions to each customer accordingly.
This could be used to predict the age of customers with demographic and purchasing data, or to predict the frequency of insurance claims.

Anomaly Detection
Anomaly detection identifies data points atypical of a given distribution. In other words, it finds the outliers. In data mining anomaly detection techniques identify subtle attribute patterns and the data points that fail to conform to those patterns that match the distribution.
Most examples of anomaly detection uses involve fraud detection, such as for insurance or credit card companies.

Predictive Analysis with supervised learning

Figure-6
Predictive Analysis method

Figure-7
Predictive analysis result

Unsupervised Learning
The other major method which is not so well defined as supervised learning is called unsupervised or descriptive learning.
Figure-8
Unsupervised learning block diagram

The training data consists of a set of input vectors


without any corresponding target values. This method can be used to identify groups of data, clustering, etc without any external teacher or critic to oversee the learning process. There is no error metric.

Figure-9
Unsupervised Learning plan

The goal of unsupervised learning problems can be

1) to discover groups of similar samples within the data, then it is called clustering, or

2) to determine the distribution density of data within the input space, known as density estimation, or

3) to project the data to a lower dimensional space from a high-dimensional space for the purpose of visualization, dimensionality reduction. This approach is also called self-organized learning or sometimes knowledge discovery or knowledge discovery in databases in short KDD in which data mining is an essential process.
The list of steps involved in the knowledge discovery process  
  • Data Cleaning − In this step, the noise and inconsistent data is removed. 
  • Data Integration − In this step, multiple data sources are combined. 
  • Data Selection − In this step, data relevant to the analysis task are retrieved from the database. 
  • Data Transformation − In this step, data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. 
  • Data Mining − In this step, intelligent methods are applied in order to extract data patterns. 
  • Pattern Evaluation − In this step, data patterns are evaluated.     
  • Knowledge Presentation − In this step, knowledge is represented.

Applications of Unsupervised learning

Discovering  Clusters


Figure-10
Cluster discovery

 Dimensionality reduction and Discovering latent factors

Figure-11
Principal Component Analysis 


Matrix completion
Figure-12
Matrix completion
  
 Image Inpainting
Figure-13
Image imputation

Collaborative filtering

Figure-14

Market Basket Analysis (Association) - King of DM algorithms


Figure-15
  
Semi-supervised Learning
Semi-supervised machine learning is a combination of supervised and unsupervised machine learning methods. This is used when the labelled data is sparse and there is large number of unlabelled data. Labelled data can be sparse when there is very large data set available and assigning targets to each is prohibitively expensive and time consuming. Organizations usually find it challenging to meet the high costs associated with labelled training process opt for semi-supervised learning.


Figure-16
Semi Supervised Learning block diagram

Semi-supervised method improves the generalization performance of supervised learning and can be used for the same scenarios as supervised learning. This method has an advantage that it minimizes human biases that may occur while training of manually labelled large data sets. This method also improves the accuracy of unsupervised learning. Semi-supervised method can be either transductive or inductive. Transductive learning refers to method of using the labelled data to predict the given unlabelled i.e., from the observed to specific, whereas inductive method refers to case of observed to general.
Figure-17
Comparison of various learning methods

Self-Supervised Learning
This is a recent development where the training data does not contain known labels or targets. Self supervision is a frame work that learns to encode representations of input data to generate the right output. Network architectures such as autoencoders, restricted Boltzmann's machine or deep belief networks, variational autoencoders, generative adversarial networks are examples of self supervised learning.
Consider an autoencoder. The network uses the input data its output. Consider an autoencoder where the input belongs space Rm. The network consists of two parts an encoder and a decoder. By the encoding process the input is transformed to space Rn, n < m. Therefore the encoder represents the input data a lower dimensional feature  vector. By a decoding process the network generates the output by transforming the n-dimensional feature vector to  back to m-dimensional vector. The error between generated output and the input is minimized by a learning process. Thus the network learns to represent the original input m-dimensional data with n-dimensional feature vector.

Figure-18
Autoencoder method

Autoencoder learning described about is essentially an unsupervised algorithm, but minimizes the output error in a manner similar to supervised method and is called unsupervised representational learning.

Reinforcement learning
The third paradigm is known as reinforcement learning which is concerned with the problem of exploring suitable actions that need to be taken in a given situation so as to maximize (exploit) the overall reward and minimize the overall errors. Reinforcement Learning is popular in tasks that involve sequential decision making. This method is a type of training algorithms use a system of reward and punishment. RL involves an agent which learns to behave in an environment, by performing certain actions and observing the results/rewards/results of those actions.

These algorithms adopt a trial-and-error approach and identifies, keep track of actions that yield maximum rewards. This is the broadest categories of machine learning and is often used for training animals. Application of this method includes  navigating an unknown route, robotics, gaming and many more.

Figure-19

RL block diagram

There are three major components in reinforcement learning, namely, the agent, the actions and the environment. The agent in this case is the decision maker, the actions are what an agent does, and the environment is anything that an agent interacts with. The main aim in this kind of learning is to select the actions that maximize the reward, within a specified time. By following a good policy, the agent can achieve the goal faster.

The major components of Reinforcement Learning:

Policy: The policy defines the learning agent’s way of behaving at a given time.  

Reward Function: The reward function defines the goal in a reinforcement learning problem.  

Value Function: The value function is a prediction of future rewards.

Model of the Environment (optional): This is something that mimics the behavior of the environment. 

 

A few applications of reinforcement learning

Game playing - determining the best move to make in a game often depends on a number of different factors, since the number of possible states that can exist in a particular game is usually very large.

Figure-20

Control problems - such as elevator scheduling, traffic control etc.


Figure-21

Path planning – The path to a target may be unknown, but can be learned by monitoring the actual distance to destination and drawing other clues along the path during the travel.



Figure-22

Robotics
Reinforcement learning (RL) enables a robot to autonomously discover an optimal behaviour through trial-and-error interactions with its environment. Instead of explicitly detailing the solution to a problem, in reinforcement learning the designer of a control task provides feedback in terms of a scalar objective function that measures the performance of the robot at every step. The following figures illustrates the diverse set of robots that have learned tasks using reinforcement learning.


Cooking Robot
Figure-23
    

Ironing Robot
Figure-24



Robots learning to grab
Figure-25



 Aerial robot
Figure-26

References:
  • Simon Haykin, Neural Networks Pearsons education asia (2001), II edition. 
  • Konstantinos Koutroumbas Sergios Theodoridis. Pattern Recognition - 4th Edition.
  • Richard ODuda. Peter E. Hart. David G. Stork. PATTERNCLASSIFICATION. Second Edition. 
  • Kevin P. Murphy, Machine learning : a probabilistic perspective. Christopher Bishop, Pattern Recognition and Machine Learning, 2nd Edition.
  •  Self-Supervised Feature Learning by Learning to Spot Artifacts; Simon Jenni Paolo Favaro, University of Bern, Switzerland. 
  • Reinforcement Learning in Robotics: A Survey; Jens Kober, J. Andrew Bagnell, Jan Peters, cmu.edu
Figure Credits
  • Figure-1: i.ytimg.com. 
  • Figure-2: kdnuggets.com. 
  • Figure-4: therbootcamp.github.io. 
  • Figure-6: clomedia.com. 
  • Figure-7: marketoonist.com. 
  • Figure-8: image.slidesharecdn.com. 
  • Figure-9: marketoonist.com. 
  • Figure-10: images.deepai.org. 
  • Figure-11: statistixl.com. 
  • Figure-12: wikimedia.org. 
  • Figure-13: wikimedia.org. 
  • Figure-14: dataaspirant.com. 
  • Figure-15: pbs.twimg.com. 
  • Figure-16: csdl-images.computer.org. 
  • Figure-17: researchgate.net. 
  • Figure-18: iq.opengenus.org. 
  • Figure-19: mathworks.com. 
  • Figure-21: image.slidesharecdn.com. 
  • Figure-22: encrypted-tbn0.gstatic.com. 
  • Figure-23: robaid.com. 
  • Figure-24: kormushev.com. 
  • Figure-25: robohub.org. 
  • Figure-26: www.microsoft.com




Comments

Popular posts from this blog

Modeling Threshold Logic Neural Networks: McCulloch-Pitts model and Rosenblatt’s Perceptrons

Artificial Intelligence and Machine Learning Life Cycle

Regularization and Generalization in Deep Learning