TensorFlow for Deep Learning

October 23, 2020

Tensors and TensorFlow

Most articles on TensorFlow begin by describing tensors as numerical arrays. This article is written for students attending my course on Artificial Intelligence – Advanced course Part-1, so that they can conceptualize tensors in all its perspectives with applications in various scientific disciplines, mathematics and AI deep learning.

Tensor Calculus and Tensors

Tensor is a concept from mathematical physics that can be used to describe physical properties like scalars, vectors, matrices etc. The concept for tensors was first introduced by Gregorio Ricci-Curbastro an Italian born mathematician with his publication on tensor calculus. Gregorio’s theory later enabled Albert Einstein to formulate the theory of general relativity.

Tensor calculus is a technique that can be regarded as a follow-up on linear algebra. It is a generalization of classical linear algebra.

Tensors in physics

While tensors can be defined in a purely mathematical sense, they are most useful in connection with vectors in physics. In physics Tensors provide a concise mathematical framework for formulating and solving physics problems in areas of electrodynamics, mechanics, etc. Before describing the usage of tensors for machine learning we try to understand Tensors from the point of view of Physics.

Scalars as Tensors.

A scalar is a physical quantity represented by a single number. An example of a scalar would be the mass of a particle or object. The concept of tensor dimensionality is not the same as that of physical quantities. The dimensionality of tensor is also its rank. A scalar is a tensor with rank zero. Scalars such as mass etc are tensors with rank zero. In a physical space a scalar is used to associate a value to a point in a scalar field. An example of a scalar field would be the density of a fluid as a function of position. A second example of a scalar field would be the value of the gravitational potential energy as a function of position. Note that both of these are single numbers (functions) that vary continuously from point-to-point, thereby defining a scalar field.

Figure-1

Temperature as a tensor

Vectors

The next higher ranked tensors are objects that are first-rank tensors called vectors. Vectors are therefore tensors of dimensionality one. However in a vector space an m-dimensional vector is a point in the m-dimensional space. Examples are 2d space or the 3d space etc.

In ordinary three-dimensional space, a vector has three components (m = 3). In four-dimensional space-time, a vector has four components (m = 4). In general, in an m-dimensional vector space, a vector has m components. A vector may be thought of as a tensor of dimension or rank one. This is because the components of a vector can be visualized as being written in a column array (or along a line, which is one dimensional). A m-dimensional vector is therefore a one dimensional tensor.

Tensors of rank one may be defined at points in the m-dimensional vector space and it may vary continuously from point-to-point, thereby defining a vector field.

An example of a vector field is of an electric field in 3d-space. The electric field at any point requires more than one number to characterize because it has both a magnitude (strength) and it acts along a definite direction. Generally, both the magnitude and the direction of the field vary from point-to-point.

Figure-2
Electric field as a tensor

Matrices as Tensors

Next higher order are objects that transform like second-rank tensors are called matrices. The components of a tensor with rank 2 can be written as a two-dimensional array. Similar to vectors that represent physical properties more complex than scalars, matrices represent physical properties yet more complex than can be handled by vectors.

An example of a second order tensor in physics is the so-called inertia matrix (or tensor) of an object. In a three-dimensional vector space objects a tensor with rank 2 can be used to characterize rotation of objects. It is a 3 x 3 = 9 element array that characterizes the behavior of a rotating body.

Well known example is that of a gyroscope. The response of a gyroscope to a force along a particular direction (described by a vector) results generally in re-orientation along some other direction different from that of the applied force or torque. Thus, rotation must be characterized by a mathematical entity more complex than either a scalar or a vector; namely, a tensor of order two.

Figure-3

Force applied to the gyroscope

Figure-4
Stress as a tensor

The above image shows stress vectors along three perpendicular directions. Each direction is represented by a face of the cube. Stress is a tensor because it describes a phenomenon occurring in two directions simultaneously (in the above figure the directions are orthogonal and mathematically independent). A force in the x direction pushing along constant y is represented as σ_xy. Similarly a force in the x direction pushing along constant z is represented as σ_xz and so on. Stress tensor is second order tensor.

Tensor for more complex systems

There are yet more complex phenomena that require tensors of even higher order. For example, in Einstein's General Theory of Relativity, the curvature of space-time, which gives rise to gravity, is described by the so-called Riemann curvature tensor, which is a tensor of order four.

Since it is defined in space-time, which is four-dimensional (m = 4), the Riemann curvature tensor can be represented as a four-dimensional array with four components (because space-time is four-dimensional) along each edge (dimension). The rank or order of the tensor is four (n = 4). The Riemann curvature tensor has 4 x 4 x 4 x 4 = 256 components. Fortunately for Einstein, it turned out that only 20 of these components are mathematically independent of each other, vastly simplifying the solution of his equations.

Tensors of any rank

Therefore tensors can be defined for any rank and may be defined at a point, or points, or it may vary continuously from point-to-point, thereby defining a field in the m-dimensional space.

Figure-5

Tensors as generalization of scalars, vector, matrices etc.

A scalar is zeroth rank tensor. A vector is a first rank tensor. Matrices are second rank tensor.

No. of components of Tensor

An n^th-rank tensor in m-dimensional space is a mathematical object that has n indices and mⁿ components and obeys certain transformation rules. Each index of a tensor ranges over the number of dimensions of space. The dimension of the space is largely irrelevant in most tensor equations (with the notable exception of the contracted Kronecker delta, description on Kronecker delta is beyond our current context).

Tensors in linear algebra

In classical linear algebra one deals with vectors and matrices. Tensors are scalars, vectors, matrices and hypermatrices which are multilinear (separately linear in each of its variable). Vectors and matrices are always multilinear. However all hypermatrices are not multilinear. A basis, form, function, etc., in two or more variables is said to be multilinear if it is linear in each variable separately. As an example a function of two variables is bilinear if it is linear with respect to each of its variables. The simplest example is f (x, y) = xy.

To simplify the concept we visualize tensors as follows. Tensors are generalizations of scalars (with no indices), vectors (arrays with exactly one index), and matrices (arrays with exactly two indices) to any array with arbitrary number of dimensions or indices. The number of dimensions in the array required to represent a tensor is known as its order or rank. In the following figure a scalar is a tensor with ‘0’ dimension; vector is tensor with ‘1’ dimension, matrix is tensor with ‘2’ dimensions and a hyper matrix is a tensor with ‘3’ dimensions. Each tensor rank corresponds to a different mathematical entity.

Figure-6

Tensors as generalization of scalars, vector, matrices etc.

Figure-7

Tensors of various ranks

Defined mathematically tensors are simply arrays of numbers, or functions that transform according to certain rules under a change of coordinates. All the operations for building matrices can be generalized to work for tensors.

It should be remarked that tensors in linear algebra as described above comprises of multi-dimensional arrays of numerical quantities or functions are tensors in the sense of Physics. For more descriptions on tensors in linear algebra read the article on “Linear Algebra for AI and ML” in this same blog series.

Tensors for Deep learning

For data science and machine learning applications tensors can be best viewed as containers that wrap and store data features. Tensors can play an important role in ML by encoding multi-dimensional data.

Tensor can be considered an object that allows valid linear transformations. From a computer science perspective, it can be helpful to think of tensors as being objects in an object-oriented sense, as opposed to simply being a data structure.

Figure-8
Tensors in deep learning.

In the context of machine learning and deep learning tensors there are subtle differences in the concept and meaning of tensors in physics and mathematics.

Figure-9

Figure-10

Before TensorFlow

The following is a discussion on Modules, Packages, Libraries, Frameworks, Engines and Platforms used extensively by software, machine learning as well as TensorFlow communities. The usage of these sometimes interchangeably causes frustrating confusions among the thinking novices. I intend to clarify the meaning of these terms before we describe about TensorFlow and its usages.

Module

Is a file which contains various python executable codes, functions and global variables that are ready to be used elsewhere. Any runnable code is a module, as examples the following program with names xyz.py, mod1.py, etc.

Package

Is a collection of modules or set of modules in a directory which is meant to achieve related functional purpose. Therefore a package includes all the related modules together. A package may also contain sub-packages. A package is a directory of Python modules containing an additional __init__.py file. The __init__.py distinguishes a package from a directory that just happens to contain a bunch of Python scripts.

Library

Is a collection of packages having a related functionality of codes otherwise it is collections of codes that accomplish a common task. It can be used in our program by just importing and calling the required method, e.g. Pandas is a library. Libraries reduce coding effort allowing the programmer to call it when and where it is required. Libraries allow the programmer flexibility of coding style.

Framework

Is a set of libraries that do not only just offer functionalities. Framework is a structure that provides the architecture for developmental work and is not usable by importing into the code. Essentially it is a collection of libraries used to solve a given task as in the example developing a game. The programmer can only integrate the code into the framework. Framework does not allow flexibility of coding style like libraries or rather forces the coding style. The difference between framework and library is expressed by the term of “inversion of control”. Libraries allow the programmer to be in charge where as a framework is in charge of the program flow. A software development kit is an example of a framework.

Engine

An "engine" is a self-contained, but externally-controllable, piece of code designed to perform a specific type of work. An engine has more tools and related support than a framework to develop an application, as an example a game engine may use additional tools to define the game physics, rendering etc. Google App Engine is used to build, maintain scale as the data storage and traffic increase.

Platform

Is the environment which includes hardware or software or both for which the software is built to run applications. Platforms provide a stable ecosystem to develop the software for an application and coordinate multiple products to work together. The code that works for one may not run on all platforms. Google AI platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively. Google AI Platform can be classified as a tool in the "Machine Learning as a Service" and it supports TensorFlow and Kubeflow.

What is TensorFlow?

Deep Learning as part of artificial intelligence is a complex discipline. Created by Google Brain team TensorFlow has made implementation of complex deep machine learning models easy and less daunting. TensorFlow accelerates the process of acquiring data, model development and training, making predictions, and further optimizing it. It offers a comprehensive ecosystem of community resources, libraries, and tools that facilitate building and deploying machine learning models. TensorFlow library incorporates different API to build at scale deep learning architecture like Convolutional Neural Networks, Recurrent Neural Networks, etc.

TensorFlow is an end-to-end open source (under Apache Open Source license) platform for machine learning. It offers a flexible ecosystem of tools, libraries and community resources for researchers and developers to use ML powered applications (tensorflow.org). It was designed and released by Google in 2015 to develop, train, test and deploy machine learning models. The latest version TensorFlow 2.0 was released in Oct 2019.

TensorFlow used for modeling deep neural networks and is best suited for dataflow programming across a range of tasks. It offers multiple abstraction levels for building and training models. It is built to run on multiple CPUs or GPUs and even mobile operating systems, and it has several wrappers (decorators) in several languages like Python, C++ or Java.

TensorFlow applications can be run on a local machine (windows, macOS, LinuX), a cluster in the cloud, iOS and Android devices, CPUs or GPUs. Graphics Processing Unit shows better flexibility and programmability for irregular computations, such as small batches and non-matrix multiplications. For faster program execution in cloud Tensorflow can be run with Google’s custom Tensor Processing Unit (TPU) which is highly-optimized for large batches and CNNs and has the highest training throughput.

TensorFlow API allows distributed training across multiple GPUs, multiple machines or TPUs. TensorBoard provides visualization and the tooling needed for machine learning experimentation.

Google uses machine learning in all of its products to improve the search engine, translation, image captioning or recommendations. TensorFlow can train and run deep neural networks for handwritten digit classification, image recognition, word embeddings, recurrent neural networks, sequence-to-sequence models for machine translation, natural language processing, and PDE (partial differential equation) based simulations and so on. Best of all, TensorFlow supports production prediction at scale, with the same models used for training.

Python, C++ and CUDA in TensorFlow

TensorFlow, as the name indicates allows defining and running computations involving tensors. Base data types are represented as n-dimensional arrays or Tensors. It uses Python to provide a front-end API for building applications. The core and hence running the applications is written in C++ and CUDA (Compute Unified Device Architecture), which is the NVIDIA platform programming platform and programming model for GPUs.

TensorFlow applications are themselves Python applications. The actual math operations, however, are not performed in Python. The libraries of transformations that are available through TensorFlow are written as high-performance C++ binaries. Python just directs traffic between the pieces, and provides high-level programming abstractions to hook them together.

How TensorFlow works

Data flow graph or computation graph

A data flow graph or computation graph is the basic of computation in TensorFlow. From now on, we refer to them as computation graph. These graphs are structures that describe how data moves through a series of processing nodes.

Figure-11

Computational Graph

Figure-12
Computational Graph

TensorFlow models any problem as a graphical model. A program in TensorFlow is basically a computation graph. The graph nodes represent computations (operations or functions) performed on the data. The edges or arcs between the nodes represent the data which are tensors (multidimensional arrays) transferred between the nodes. Nodes and tensors in TensorFlow are Python objects.

In this way the data and the control of the program is represented in one integrated model. Computational graphs uses graph theory to represent computations performed during execution of a program. Graph theory simplifies visualization of the data flow as it flows from input to output. TensorFlow programs are made up of two kinds of operations construction (building the graph) and execution (running the graph).

TensorFlow graphs are Directed Acyclic Graphs (DAGs)

A computational graph which is cyclic will never end its computations, therefore TensorFlow graphs are acyclic. Consider the following graph that illustrates the concept with three nodes A, B, C. These types of graphs are also called Directed Acyclic Graphs (DAG), because there are no directed cyclic loops.

Figure-13

The figures (a) and (b) are acyclic, but (c) is cyclic

TensorBoard for TensorFlow visualizing

TensorBoard is browser-based visualization tool packaged with TensorFlow. TensorBoard helps to visualize the model architecture, TensorFlow operations and layers, display images, audio, text, interactive word-embedding in 3D, etc. It can track and monitor histograms of weights, bias, activations overtime, model loss and accuracy and performance metrics of the model during training and many more.

Figure-14
Monitoring activations with histograms.

Exploiting Parallelism

A computational graph expresses the possibilities for concurrent execution of the program parts. Therefore computational graphs expose and allow to exploit the parallelism available in set of computations represented by the model. A graph can be divided into multiple parts and each part can be placed and executed on separate devices, such as a CPU or GPU. Thus graph nodes can be placed on specific devices. Nodes that can be computed simultaneously can be enabled to operate parallelly.

Figure-15

This is followed by nodes that can perform sequential computations. TensorFlow finds the first set of nodes that it can fire or execute. The firing of these nodes results in the firing of other nodes, and so on.

Tensorflow in Cloud

TensorFlow provides APIs (Application Program Interface) for distributed training in Google Cloud, requiring minimal setup and no changes to your model developed local environment. TensorFlow Cloud handles cloud-specific tasks such as creating Virtual Machine instances and distribution strategies for your models automatically.

Figure-16

Figure-17

Tensorflow with Cloud TPUs

Tensor Processing Units are hardware accelerators designed to work with TensorFlow software and specialized for deep learning tasks. TPUs are application specific integrated circuits (ASICs) workers in Google cloud. TPUs accelerates linear algebra computations and minimize processing time for training for large and complex neural networks. They also allow scalable operations across different machines with TPU servers.

Figure-18

Quantizing to accelerate computations

To speed up computations TPUs make use of quantization technique which is the process of approximation of an arbitrary value between a preset minimum and a maximum value with an 8-bit integer. This technique compresses floating-point calculations with 32-bit or even 16-bit numbers to 8-it integers. TPUs contain 65,536 of 8-bit integer multipliers. Quantization helps to reduce time without significantly sacrificing the model accuracy.

Figure-19

Google Cloud TPU servers

TensorFlow Extended (TFX) for end-to-end ML and deployment to production

After completing the developmental training and evaluation of a deep learning model with TFX the model can be deployed on device. TFX helps to complete a ML pipeline. TFX pipeline is a sequence of components which includes modeling, training, serving inference, deployments to online, native mobile and JavaScript targets. TFX pipeline is specifically designed for moving from modeling to production of scalable and high performance machine learning tasks. Scalable models are adaptable to keep up with the ever changing data. It is therefore required to minimize the time for continuous retraining, continuous integration and deployment pipelines.

Figure-20

Figure-21

TensorFlow allows data scientists to integrate with various Google cloud services like Datalab, Big Query, Data Flow, Data Prep and Data Proc for running Apache Spark and Apache Hadoop. Google Cloud Logging and Cloud Monitoring service along with ML APIs are used to monitor the model performance in production and for any performance degradation. TensorFlow Extended (TFX) and Kubeflow pipelines automate ML workflow and continuous integration and continuous delivery process. Google’s KubeFlow uses a set of open-source ML libraries to be run on Kubernetes clusters.

What is Keras?

Written in Python by Francois Chollet Keras is an open-source model level library that provides a high level Application Program Interface for deep learning. It is designed to provide fast experimentation with deep neural networks.

It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System) and distributed by MIT. Keras acts as an interface for the TensorFlow library. Keras plug-and-play framework that lets them build, train, and evaluate their models quickly. Up until version 2.3 Keras supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, R, Theano, and PlaidML. Keras does not handle low level operations such as tensor manipulations and differentiation, but relies on established tensor libraries that serve as backend engines. The following figure illustrates the modular approach software and hardware stack that can be plugged into Keras. While running on CPUs, TensorFlow wraps a low level library called Eigen/BLAS for backend matrix products, matrix decompositions and other tensor operations. While running on GPUs, TensorFlow wraps well optimized deep learning operations with NVIDIA CUDA Deep Neural Network library (cuDNN).

Figure-22

Keras software and hardware stack

The latest version 2.4.3 built on top of TensorFlow, can scale to large scale of GPUs or an entire TPU pod. Keras is one of the most used deep learning framework on Kaggle.

Native Keras in TFX

TensorFlow 2.0 has adopted Keras with tight integration. TFX has native support for Keras models. TFX only supports the TensorFlow 2 version of Keras. TensorFlow Cloud is a set of utilities to help you run large-scale Keras training jobs on GCP with very little configuration effort. It is managed by the Keras team at Google. Running your experiments on 8 or more GPUs in the cloud is easy.

Deploying Keras on Web

After training and satisfactory evaluation of the learning model it can be moved to the web so that it can be used as web service. It will then be possible to access, use and interact with it over HTTP from the browser. Web services like Flask and Django can be used to achieve this.

Both Flask and Django are web application frameworks written in Python. Both are free, open source, enable rapid development of secure and maintainable websites and perform as a back-end server for Keras. Flask is lighter and quicker but more suitable for smaller and less complicated tasks.

TensorFlow vs. others

Before concluding we discuss a comparison of TensorFlow with other Deep Learning frameworks.

PyTorch

Developed by Facebook’s AI research lab, it is an open source free machine learning framework based on Torch library. It is built with Python, and has many other similarities to TensorFlow. The APIs for Pytorch is intuitive and easier to learn enables quick experimentation making it popular to the research community. It runs on Linux, macOS, and Windows. PyTorch is generally a better choice for fast development of projects that need to be up and running in a short time. TensorFlow is built with production in mind and is better choice for larger projects and more complex workflows. TensorFlow offers better visualization which helps developers to debug and track the training process. Pytorch, however, provides only limited visualization. The latest version of PyTorch is 1.6

Caffe

Caffe is a deep learning framework created by Yangqing Jia created the project during his PhD at UC Berkeley and developed by Berkeley AI Research (BAIR) and by community contributors. It is made with expression, speed, and modularity in mind. It is Caffe is released under the BSD 2-Clause license. Caffe is being used in academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia. Yahoo! has also integrated caffe with Apache Spark to create CaffeOnSpark, a distributed deep learning framework (wikipedia).

CNTK

Microsoft Cognitive Toolkit, like TensorFlow describes dataflow and neural networks as a series of computational steps via directed graphs. CNTK can be included as a library in your Python, C#, C++, Java programs or used as a standalone machine-learning tool. CNTK supports 64-bit Linux or 64-bit Windows operating systems. CNTK implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK supports the Open Neural Network Exchange ONNX format, an open-source shared model representation for framework interoperability and shared optimization. The latest release of CNTK is 2.7.

Apache MXNet

Is a flexible and efficient deep learning library developed by Apache Software Foundation. It is adopted by Amazon as the premier deep learning framework on AWS, can scale almost linearly across multiple GPUs and multiple machines. Apache MXnet provides deep integration into Python and support for Scala, Julia, Clojure, Java, C++, R, Go and Perl.

Theano

Theano was developed by MILA lab at Universite de Montreal in 2007 and considered the grandfather of deep learning frameworks Theano is an open source Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Recently Theano has fallen out of favor by most researchers outside academia.

References

Francois Chollet, “Deep Learning with Python”, Manning Publications,

Kees Dullemond & Kasper Peeters, “Introduction to Tensor Calculus”, uni-heidelberg.de
tensorflow.org
mathworld.wolfram.com
wikiversity.org
physlink.com
Computing-hadamard-product-of-two-tensors-in-tensorflow-tensorflow, tutorialexample.com
euclideanspace.com
Jimmie Lawson, Lousiana State University, math.lsu.edu/~lawson/Chapter9.pdf
Wikipedia.org
nvidia.com/blog
mygreatlearning.com
deepai.com
kdnuggets.com
ubuntu.com/kubeflow
keras.io
pytorch.org
microsoft.com

mxnet.apache.org

djangoproject

https://developer.mozilla.org/en-US/docs/Learn/Server-side/Django/Introduction

predera.com

caffe.berkeleyvision.org

Image Credits

Figure-1: slideplayer.com

Figure-2: cds.cern.ch
Figure-3: tpub.com
Figure-4: yorkporc.files.wordpress.com
Figure-5: wikimedia.org
Figure-6: qph.fs.quoracdn.net
Figure-7: i2tutorials.com

Figure-8: big-data.tips
Figure-9: slideshare.net
Figure-10: slideshare.net
Figure-12: cvml.ist.ac.at
Figure-14: jhui.github.io
Figure-15: itrelease.com
Figure-16: cloud.google.com
Figure-17: slideshare.net
Figure-18: slideshare.net
Figure-19: cloud.google.com
Figure-20: github.com
Figure-21: predera.com/
Figure-22: livebook.manning.com