# I need a [very well paying] job in data science. Now.

I need a very well paying job in data science. Now. In Pune, India.

Yes, I was visiting Kota for some official work when at the railway station of the [back then, a simple little] town, on a “whim” (borne out of a sense of curiosity, having heard the author’s name), I bought it. That was on 14th July 1987. The stamp of the A. H. Wheeler and Company (Rupa Publications), so well known to us all (including IITians and IIM graduates) back then, stand in a mute testimony for the same—the price, and the fact that this little book was imported by them. As to bearing testimony to the event, so does my signature, and the noting of the date. (I would ordinarily have no motivation to note a fake date, what do you say?) Also notable is the price of the book: Rs. 59/-. Bought out of Rs. 1800/- per month, if I remember those days right (and plain because I was an M. Tech. from (one of the then five) IITs. My juniors from my own UG college, COEP, would have had to start with a Rs. 1200/- or Rs. 1400/- package, and rise to my level in about 3 years, back then.)

Try to convince my the then back self that I would be jobless today.

No, really. Do that.

And see if I don’t call you names. Right here.

Americans!

A song I like:

(English, pop-song): “Another town, another train…”
Band (i.e. music, composition, lyrics, etc., to the best of my knowledge): ABBA

Bye for now.

And develop a habit to read—and understand—books. That’s important. As my example serves to illustrate the point. Whether I go jobful or jobless. It’s a good habit to cultivate.

But then, Americans have grown so insensitive to the authentic pains of others—including real works by others. The said attitude must reflect inwards too. The emphasis is on the word “authentic.” If a man doesn’t care for another honest, really very hard-working man in pain but spends his intellect and time in finding rationalizations to enhance his own prestige and money-obtaining powers, by the law of integrative mechanism of conscisousness that is the law of “karma,” the same thing must haunt him back—whether he be a Republican, or a Democrat. (Just a familiarity with the word “karma” is not enough to escape its bad—or good—effects. What matters are actions (“karma”s), ultimately. But given the fact that man has intellect, these are helped, not obscured, by it.)

Go, convince Americans to give me a good, well-paying job, in data science, and in Pune—the one that matches my one-sentence profile (mentioned here) and my temperament. As to the latter, simple it is, to put it in one sentence: “When the time calls for it, I am known to call a spade a spade.”

And, I can call Americans (and JPBTIs) exactly what they have earned.

But the more important paragraph was the second in this section. Starting from “But then, Americans have grown so insensitive to the authentic… .”

Instead of “pains,” you could even add a value / virtue. The statement would hold.

/

# Data science code snippets—2: Using world’s simplest ANN to visualize that deep learning is hard

Good morning! [LOL!]

In the context of machine learning, “learning” only means: changes that get effected to the weights and biases of a network. Nothing more, nothing less.

I wanted to understand exactly how the learning propagates through the layers of a network. So, I wrote the following piece of code. Actually, the development of the code went through two distinct stages.

First stage: A two-layer network (without any hidden layer); only one neuron per layer:

In the first stage, I wrote what demonstrably is world’s simplest possible artificial neural network. This network comes with only two layers: the input layer, and the output layer. Thus, there is no hidden layer at all.

The advantage with such a network (“linear” or “1D” and without hidden layer) is that the total error at the output layer is a function of just one weight and one bias. Thus the total error (or the cost function) is a function of only two independent variables. Therefore, the cost function-surface can (at all) be directly visualized or plotted. For ease of coding, I plotted this function as a contour plot, but a 3D plot also should be possible.

Here are couple of pictures it produces. The input randomly varies in the interval [0.0, 1.0) (i.e., excluding 1.0), and the target is kept as 0.15.

The following plot shows an intermittent stage during training, a stage when the gradient descent algorithm is still very much in progress.

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent in progress.

The following plot shows when the gradient descent has taken the configuration very close to the local (and global) minimum:

SImplest network with just input and output layers, with only one neuron per layer: Gradient descent near the local (and global) minimum.

Second stage: An n-layer network (with many hidden layers); only one neuron per layer

In the second stage, I wanted to show that even if you add one or more hidden layers, the gradient descent algorithm works in such a way that most of the learning occurs only near the output layer.

So, I generalized the code a bit to have an arbitrary number of hidden layers. However, the network continues to have only one neuron per layer (i.e., it maintains the topology of the bogies of a train). I then added a visualization showing the percent changes in the biases and weights at each layer, as learning progresses.

Here is a representative picture it produces when the total number of layers is 5 (i.e. when there are 3 hidden layers). It was made with both the biases and the weights all being set to the value of 2.0:

It is clearly seen that almost all of learning is limited only to the output layer; the hidden layers have learnt almost nothing!

Now, by way of a contrast, here is what happens when you have all initial biases of 0.0, and all initial weights of 2.0:

Here, the last hidden layer has begun learning enough that it shows some visible change during training, even though the output layer learns much more than does the last hidden layer. Almost all learning is via changes to weights, not biases.

Next, here is the reverse situation: when you have all initial biases of 2.0, but all initial weights of 0.0:

The bias of the hidden layer does undergo a slight change, but in an opposite (positive) direction. Compared to the case just above, the learning now is relatively more concentrated in the output layer.

Finally, here is what happens when you initialize both biases and weights to 0.0.

The network does learn (the difference in the predicted vs. target does go on diminishing as the training progresses). However, the percentage change is too small to be visually registered (when plotted to the same scale as what was used earlier).

The code:

Here is the code which produced all the above plots (but you have to suitably change the hard-coded parameters to get to each of the above cases):

'''
SimplestANNInTheWorld.py
-- Implements the case-study of the simplest possible 1D ANN.
-- Each layer has only neuron. Naturally, there is only one target!
-- It even may not have hidden layers.
-- However, it can have an arbitrary number of hidden layers. This
feature makes it a good test-bed to see why and how the neurons
in the hidden layers don't learn much during deep learning, during
a straight-forward'' application of the gradient descent algorithm.
-- Please do drop me a comment or an email
if you find this code useful in any way,
say, in a corporate training setup or
-- History:
* 30 December 2018 09:27:57  IST:
Project begun
* 30 December 2018 11:57:44  IST:
First version that works.
* 01 January 2019 12:11:11  IST:
Added visualizations for activations and gradient descent for the
last layer, for no. of layers = 2 (i.e., no hidden layers).
* 01 January 2019 18:54:36  IST:
Added visualizations for percent changes in biases and
weights, for no. of layers &amp;amp;amp;amp;gt;=3 (i.e. at least one hidden layer).
* 02 January 2019 08:40:17  IST:
The version as initially posted on my blog.
'''
import numpy as np
import matplotlib.pyplot as plt

################################################################################
# Functions to generate the input and test data

def GenerateDataRandom( nTrainingCases ):
# Note: randn() returns samples from the normal distribution,
# but rand() returns samples from the uniform distribution: [0,1).
adInput = np.random.rand( nTrainingCases )

def GenerateDataSequential( nTrainingCases ):
adInput = np.linspace( 0.0, 1.0, nTrainingCases )

def GenerateDataConstant( nTrainingCases, dVal ):
adInput = np.full( nTrainingCases, dVal )

################################################################################
# Functions to generate biases and weights

def GenerateBiasesWeightsRandom( nLayers ):
adAllBs = np.random.randn( nLayers-1 )
adAllWs = np.random.randn( nLayers-1 )

def GenerateBiasesWeightsConstant( nLayers, dB, dW ):
adAllBs = np.ndarray( nLayers-1 )
adAllWs = np.ndarray( nLayers-1 )

################################################################################
# Other utility functions

def Sigmoid( dZ ):
return 1.0 / ( 1.0 + np.exp( - dZ ) )

def SigmoidDerivative( dZ ):
dA = Sigmoid( dZ )
dADer = dA * ( 1.0 - dA )

# Server function. Called with activation at the output layer.
# In this script, the target value is always one and the
# same, i.e., 1.0).
# Assumes that the form of the cost function is:
#       C_x = 0.5 * ( dT - dA )^2
# where, note carefully, that target comes first.
# Hence the partial derivative is:
#       \partial C_x / \partial dA = - ( dT - dA ) = ( dA - dT )
# where note carefully that the activation comes first.
def CostDerivative( dA, dTarget ):
return ( dA - dTarget )

def Transpose( dA ):
np.transpose( dA )
return dA

################################################################################
# Feed-Forward

def FeedForward( dA ):
## print( "\tFeed-forward" )
l_dAllZs = []
# Note, this makes l_dAllAs have one extra data member
# as compared to l_dAllZs, with the first member being the
# supplied activation of the input layer
l_dAllAs = [ dA ]
nL = 1
for w, b in zip( adAllWs, adAllBs ):
dZ = w * dA + b
l_dAllZs.append( dZ )
# Notice, dA has changed because it now refers
# to the activation of the current layer (nL)
dA = Sigmoid( dZ )
l_dAllAs.append( dA )
## print( "\tLayer: %d, Z: %lf, A: %lf" % (nL, dZ, dA) )
nL = nL + 1
return ( l_dAllZs, l_dAllAs )

################################################################################
# Back-Propagation

def BackPropagation( l_dAllZs, l_dAllAs ):
## print( "\tBack-Propagation" )
# Step 1: For the Output Layer
dZOP = l_dAllZs[ -1 ]
dAOP = l_dAllAs[ -1 ]
dZDash = SigmoidDerivative( dZOP )
dDelta = CostDerivative( dAOP, dTarget ) * dZDash

# Since the last hidden layer has only one neuron, no need to take transpose.
dAPrevTranspose = Transpose( l_dAllAs[ -2 ] )
dGradW = np.dot( dDelta, dAPrevTranspose )

# Step 2: For all the hidden layers
for nL in range( 2, nLayers ):
dZCur = l_dAllZs[ -nL ]
dZCurDash = SigmoidDerivative( dZCur )
dWNext = adAllWs[ -nL+1 ]
dWNextTranspose = Transpose( dWNext )
dDot = np.dot( dWNextTranspose, dDelta )
dDelta = dDot * dZCurDash

dAPrev = l_dAllAs[ -nL-1 ]
dAPrevTrans = Transpose( dAPrev )
dGradWCur = np.dot( dDelta, dAPrevTrans )

def PlotLayerwiseActivations( c, l_dAllAs, dTarget ):
plt.subplot( 1, 2, 1 ).clear()
dPredicted = l_dAllAs[ -1 ]
sDesc = "Activations at Layers. Case: %3d\nPredicted: %lf, Target: %lf" % (c, dPredicted, dTarget)
plt.xlabel( "Layers" )
plt.ylabel( "Activations (Input and Output)" )
plt.title( sDesc )

nLayers = len( l_dAllAs )
dES = 0.2	# Extra space, in inches
plt.axis( [-dES, float(nLayers) -1.0 + dES, -dES, 1.0+dES] )

# Plot a vertical line at the input layer, just to show variations
plt.plot( (0,0), (0,1), "grey" )

# Plot the dots for the input and hidden layers
for i in range( nLayers-1 ):
plt.plot( i, l_dAllAs[ i ], 'go' )
# Plot the dots for the output layer
plt.plot( nLayers-1, dPredicted, 'bo' )
plt.plot( nLayers-1, dTarget, 'ro' )

def PlotGradDescent( c, dOrigB, dOrigW, dB, dW ):
plt.subplot( 1, 2, 2 ).clear()

d = 5.0
ContourSurface( d )
plt.axis( [-d, d, -d, d] )
plt.plot( dOrigB, dOrigW, 'bo' )
plt.plot( dB, dW, 'ro' )
plt.grid()
plt.xlabel( "Biases" )
plt.ylabel( "Weights" )
sDesc = "Gradient Descent for the Output Layer.\n" \
"Case: %3d\nWeight: %lf, Bias: %lf" % (c, dW, dB)
plt.title( sDesc )

def ContourSurface( d ):
nDivs = 10
dDelta = d / nDivs
w = np.arange( -d, d, dDelta )
b = np.arange( -d, d, dDelta )
W, B = np.meshgrid( w, b )
A = Sigmoid( W + B )
plt.imshow( A, interpolation='bilinear', origin='lower',
cmap=plt.cm.Greys, # cmap=plt.cm.RdYlBu_r,
extent=(-d, d, -d, d), alpha=0.8 )
CS = plt.contour( B, W, A )
plt.clabel( CS, inline=1, fontsize=7 )

plt.clf()

nComputeLayers = len( adOrigBs )
plt.axis( [-0.2, nComputeLayers+0.7, -320.0, 320.0] )

print( "Case: %3d" \
"\nPercent Changes in Biases:\n%s" \
"\nPercent Changes in Weights:\n%s\n" \
adx = np.linspace( 0.0, nComputeLayers-1, nComputeLayers )
plt.plot( adx + 1.0, adWPct, 'ro' )
plt.plot( adx + 1.15, adBPct, 'bo' )
plt.grid()
plt.xlabel( "Layer Number" )
plt.ylabel( "Percent Change in Weight (Red) and Bias (Blue)" )
sTitle = "How most learning occurs only at an extreme layer\n" \
"Percent Changes to Biases and Weights at Each Layer.\n" \
"Training case: %3d, Target: %lf, Predicted: %lf" % (c, dTarget, dPredicted)
plt.title( sTitle )

adPct = np.zeros( n )
dSmall = 1.0e-10
if all( abs( adDiff ) ) &amp;amp;amp;amp;gt; dSmall and all( abs(adOrig) ) &amp;amp;amp;amp;gt; dSmall:

################################################################################
# The Main Script
################################################################################

dEta = 1.0 # The learning rate
nTrainingCases = 100
nTestCases = nTrainingCases // 5
adInput = GenerateDataRandom( nTrainingCases ) #, 0.0 )
adTest = GenerateDataRandom( nTestCases )

## print( "Data:\n %s" % (adInput) )

# Must be at least 2. Tested up to 10 layers.
nLayers = 2
# Just a single target! Keep it in the interval (0.0, 1.0),
# i.e., excluding both the end-points of 0.0 and 1.0.

dTarget = 0.15

# The input layer has no biases or weights. Even the output layer
# here has only one target, and hence, only one neuron.
# Hence, the weights matrix for all layers now becomes just a
# vector.
# For visualization with a 2 layer-network, keep biases and weights
# between [-4.0, 4.0]

# adAllBs, adAllWs = GenerateBiasesWeightsRandom( nLayers )
adAllBs, adAllWs = GenerateBiasesWeightsConstant( nLayers, 2.0, 2.0 )

## print( "Initial Biases\n", adAllBs )
## print( "Initial Weights\n", adAllWs )

plt.figure( figsize=(10,5) )

# Do the training...
# For each input-target pair,
for c in range( nTrainingCases ):
dInput = adInput[ c ]
## print( "Case: %d. Input: %lf" % (c, dInput) )

adAllGradBs = [ np.zeros( b.shape ) for b in adAllBs ]
adAllGradWs = [ np.zeros( w.shape ) for w in adAllWs ]

# Do the feed-forward, initialized to dA = dInput
l_dAllZs, l_dAllAs = FeedForward( dInput )

# Do the back-propagation

## print( "Updating the network biases and weights" )
adAllBs = [ dB - dEta * dDeltaB
adAllWs = [ dW - dEta * dDeltaW

## print( "The updated network biases:\n", adAllBs )
## print( "The updated network weights:\n", adAllWs )

if 2 == nLayers:
PlotLayerwiseActivations( c, l_dAllAs, dTarget )
dW = adAllWs[ -1 ]
dB = adAllBs[ -1 ]
PlotGradDescent( c, dOrigB, dOrigW, dB, dW )
else:
# Plot in case of many layers: Original and Current Weights, Biases for all layers
# and Activations for all layers
dPredicted = l_dAllAs[ -1 ]
plt.pause( 0.1 )

plt.show()

# Do the testing
print( "\nTesting..." )
for c in range( nTestCases ):

dInput = adTest[ c ]
print( "\tTest Case: %d, Value: %lf" % (c, dInput) )

l_dAllZs, l_dAllAs = FeedForward( dInput )
dPredicted = l_dAllAs[ -1 ]
dDiff = dTarget - dPredicted
dCost = 0.5 * dDiff * dDiff
print( "\tInput: %lf, Predicted: %lf, Target: %lf, Difference: %lf, Cost: %lf\n" % (dInput, dPredicted, dTarget, dDiff, dCost) )

print( "Done!" )



Things you can try:

• Change one or more of the following parameters, and see what happens:
• Target value
• Values of initial weights and biases
• Number of layers
• The learning rate, dEta
• Change the cost function; e.g., try the linear function instead of the Sigmoid. Change the code accordingly.
• Also, try to conceptually see what would happen when the number of neurons per layer is 2 or more…

Have fun!

A song I like:

(Marathi) “pahaaTe pahaaTe malaa jaag aalee”
Music and Singer: C. Ramchandra
Lyrics: Suresh Bhat

# Data science code snippets—1: My first ANN (a copy of Nielsen’s)

I took the liberty to re-type Nielsen’s[^] code [^], in the process introducing the Hungarian naming convention for variables, and also many other non-Pythonic and non-Nielsenic whimsies. Here is the end-result, for whatever worth it is. (Its sole intended purpose was to run the feed-forward and back-propagation algorithms in IDE.) Tested just a slight bit on Ubuntu Release 18.04.1 LTS (Bionic Beaver) 64-bit + Python 3.7.0 64-bit in conda 4.5.11 + MS VSCode 1.30.1.

It works. (Has to. Except for the cosmetic changes, it’s basically Nielsen’s code.) It produces this graph as an output:

Total Errors vs. Epochs

“Enjoy!”

'''
ANN.Nielsen.SmallerAndEasiestDataVectors.py
-- Simple Artificial Neural Network.
-- This code very closely follows "network.py given in Nielsen's book
(chapter 1). It was modified a bit by Ajit R. Jadhav.
-- Meant for developing understanding of ANNs through debugging in IDE
(with ability to inspect values live at run-time that also can be
verified manually, by taking *smaller* and *simpler* vectors for the input
and target data-sets).
-- Includes functions for generation of small and simple 1D input vectors
and their corresponding target vectors. The input data generated here is,
effectively, just a randomized version of Nielsen's "vectorized_result"
function.
-- Other features of this code:
* Writes the total error for each epoch, to a file.
* Plots the total error vs. epoch number.
* Follows the Hungarian naming convention. However, the variable names
may not reflect the best possible choices; this code was written in a hurry.
-- Do whatever you like with it, but 100% on your own
responsibility.
-- But if you use this code for academic or corporate
training purposes, then please do let me know by an
email or a (confidential) comment here.
-- History
* Begun: 23 Dec. 2018
* Completed first version that works: 26 Dec. 2018
* This version: Sunday 30 December 2018 11:54:52  IST
'''

import random
import numpy as np
import matplotlib.pyplot as plt

################################################################################
# Global helper functions
################################################################################

################################################################################
# Helper function. Uses numpy.random.randn() to create inputs and targets,
# whether for training or for testing. Useful for very simple 1D studies only.
# (The input data as generated here very easily predict the targets! Even just
# a single call to the softmax function would have been sufficient!!)
def GenerateData( nNeursInInput, nLenOPLayer, nCasesPerTarget ):
# Python lists for inputs and targets, returned at the end by this function.
l_mdInput = []
l_mdTarget = []
for t in range( nLenOPLayer ):
for i in range( nCasesPerTarget ):
mdInput = np.random.rand( nNeursInInput, 1 )*0.1
mdInput[t][0] = mdInput[t][0] + np.random.random_sample()*0.1 + 0.8
l_mdInput.append( mdInput )

mdTarget = np.zeros( (nLenOPLayer,1) )
mdTarget[t][0] = 1.0
l_mdTarget.append( mdTarget )
# pickle returns data as arrays; let's simulate the same thing here
amdInput = np.array( l_mdInput )
amdOutput = np.array( l_mdTarget )
# Python 3(+?) thing. Convert zip to list either here or later.
# Better to do it here.
Data = list( zip( amdInput, amdOutput ) )
return Data

################################################################################
# Helper function. Computes the sigmoid activation function
def Sigmoid( mdZ ):
return 1.0 / ( 1.0 + np.exp( - mdZ ) )

################################################################################
# Helper function. Computes the derivative of the sigmoid activation function
def SigmoidDerivative( mdZ ):
mdA = Sigmoid( mdZ )
mdADer = mdA * ( 1.0 - mdA )

################################################################################
# Helper function. Called with a single activation vector with its
# target vector. Assumes that the form of the cost function for each
# neuron is (note carefully that target comes first):
#       C_x = 0.5 * ( dT - dA )^2
# so that
#       \partial C_x / \partial dA = - ( dT - dA ) = ( dA - dT )
# (where note that the activation comes first).
def CostDerivative( mdA_OP, mdT ):
return ( mdA_OP - mdT )

################################################################################
# Class Network begins here
################################################################################

class Network(object):

############################################################################
# Constructor.
# Supply an array containing number of neurons for each layer
# e.g., [5,4,3]
def __init__(self, anNIL):
self.m_nLayers = len( anNIL )
self.m_anNIL = np.array( anNIL )
# anNIL[1:] means: An array with all entries except the first
# anNIL[:-1] means: An array with all entries except for the last
# We allocate (N,1)-shaped matrices rather than (N)-shaped vectors
# because later on in back-propagation, to have a vectorized code,
# we perform np.dot() in its tensor-product-like avatar, and there,
# we need both the operands to be matrices.
self.m_amdAllBs = [ np.random.randn( nNeursCur, 1 )
for nNeursCur in anNIL[1:] ]
self.m_amdAllWs = [ np.random.randn( nNeursCur, nNeursPrev )
for nNeursPrev, nNeursCur in
zip( anNIL[:-1], anNIL[1:] ) ]
pass

############################################################################
# For each (input, target) tuple representing a row in the
# mini-batch, perform the training. We compute the changes (delta's)
# to be effected to biases and weights through feedback and
# backpropagation, add them up until the mini-batch is over, and then
# apply them to update the weights and biases only once at the end of
# this function. Unlike in Nielsen's code, here we have a separate
# function each for feed-forward and back-propagation
def UpdateMiniBatch( self, dEta, nMiniBatchSize, MiniBatch ):

# Allocate a local matrix each for holding the GradB and GradW matrices
# for all the layers. Both are initialized to zero's. Can't directly
# use np.zeros() because these matrices are not of uniform sizes;
# the no. of neurons in each layer is arbitrarily different
amdAllGradBs = [ np.zeros( mdGradB.shape ) for mdGradB in self.m_amdAllBs ]
amdAllGradWs = [ np.zeros( mdGradW.shape ) for mdGradW in self.m_amdAllWs ]
l_mdErrorsMB = []

# For each (input, target) tuple representing a row in the
# mini-batch, perform the training.

for x, y in MiniBatch:

# Feed-Forward Pass: Feed the next input vector to the network
# using the existing weights and biases, and find the net-input
# and activations it produces at all the network layers.

l_mdAllZs, l_mdAllAs = self.FeedForward( x )

# Compute the error vector
mdDiff = y - l_mdAllAs[ -1 ]
mdErrors = 0.5 * mdDiff * mdDiff
l_mdErrorsMB.append( mdErrors )

# Back-Propagation Pass: Back-propagate the change in the total cost
# function (total error) to the local gradients to be applied to
# each weight (of a synaptic connection between current and previous
# layers), and to each bias (of the current layer).
amdDeltaGradB, amdDeltaGradW = self.BackProp( l_mdAllAs, l_mdAllZs, y )

# Add the changes to the local copies of the biases and weights.
# The zip function takes iterable elements as input, and returns an
# iterator for the tuple. Thus, zipping here makes it iterate through
# each layer of the network.

# Processing for all the rows in the current mini-batch is now over.
# Now, update the network data-members from the local copies.

# Note, we take an average of the changes over all rows.
dEtaAvgOverMB = dEta / nMiniBatchSize

# w means: the old weights matrix for a *single* layer only
# nw means: \sum_{i}^{all rows of mini-batch} GradW_{i-th row}
self.m_amdAllBs = [ mdB - dEtaAvgOverMB * mdGradB
for mdB, mdGradB in zip( self.m_amdAllBs, amdAllGradBs ) ]

self.m_amdAllWs = [ mdW - dEtaAvgOverMB * mdGradW
for mdW, mdGradW in zip( self.m_amdAllWs, amdAllGradWs ) ]

# Return the average error vector for this mini-batch
return l_mdErrorsMB

############################################################################
# Called for a single input vector (i.e., "x" in Nielsen's code)
def FeedForward( self, mdA ):

# Python list of all activations, layer-by-layer
l_mdAllAs = [ mdA ]
# Python list for z vectors, layer-by-layer
l_mdAllZs = []

# For the weight and bias matrices of each layer...
for mdW, mdB in zip( self.m_amdAllWs, self.m_amdAllBs ):
# Compute the net input vector to activate
# the neurons of this layer
mdZ = np.dot( mdW, mdA ) + mdB
l_mdAllZs.append( mdZ )

# Compute the activations for all the neurons
# of this layer
mdA = Sigmoid( mdZ )
l_mdAllAs.append( mdA )

return ( l_mdAllZs, l_mdAllAs )

############################################################################
# Called with inputs and activations for all the layers,
# i.e., for the entire network in one go.
def BackProp( self, l_mdAllAs, l_mdAllZs, mdTarget ):

# Allocate a local matrix each for holding the GradB and GradW matrices
# for all the layers. Both are initialized to zero's. Can't use
# np.zeros() because these matrices are not of uniform sizes;
# the no. of neurons in each layer is arbitrarily different
amdAllGradBs = [ np.zeros( mdB.shape ) for mdB in self.m_amdAllBs ]
amdAllGradWs = [ np.zeros( mdW.shape ) for mdW in self.m_amdAllWs ]

# Back-propagation occurs in two distinct stages:
# (i) In the first stage, we begin from the end (the output layer), and
# use the layer just before that, in order to update the biases and
# weights of the connections lying between the two (which are, in this
# class, are stored in the OP layer).
# (ii) In the second stage, we iterate back successively through the
# network, starting from the last hidden layer, up to the first hidden
# layer.
# The split-up into two stages is necessary because the activation of a
# neuron from the input- or hidden-layers flows through many (all)
# neurons in the immediately next layer, and the total cost function
# (i.e. the total error) is affected by *all* these paths. In contrast,
# the output layer has no next layer, and so, the activation of a
# neuron in the output layer affects the total cost function only
# through its own connection to the total cost, not that of other
# layers. So, the Delta at the OP layer can be directly
# computed from a knowledge of the OP layer activations (predictions)
# and the prescribed targets. But to compute the Delta at an
# intermediate layer, we do need the Delta for the immediately
# next layer.

##########
# Stage I: Compute mdGradA and mdGradB for the Output Layer

# The OP layer is the last, accessed through the -1 index. Get the
# activations (A's) on its output side, and the net inputs (Z's) on its
# input side.
mdA_OP = l_mdAllAs[ -1 ]
mdZ_OP = l_mdAllZs[ -1 ]

# Compute the partial derivatives, and use them to find the Grad for
# all B's in the OP layer.
mdZ_OP_Der = SigmoidDerivative( mdZ_OP )
# As an intermediate product, find the delta for the OP layer. It is
# subsequently used in the second stage as an input, thereby
# back-propagating the changes to the total cost function back
# throughout the network.
mdDelta = CostDerivative( mdA_OP, mdTarget ) * mdZ_OP_Der
# Compute the Grad for all B's in the OP layer
amdAllGradBs[ -1 ] = mdDelta

# Compute the partial derivatives, and find the Grad for all W's in
# the layer just before the OP layer. Index -2 means: just before OP.
# We use "LHL" to mean the last hidden layer.
mdA_LHL_Transpose = l_mdAllAs[ -2 ].transpose()
# Here, we build the GradW *matrix* from what essentially are two
# vectors (or (N,1) matrices). Even if the numpy function says
# "dot", it actually works like the tensor product---it outputs an
# (N X N) matrix, not a single number as a scalar. It is for this step
# that we made even B's an (N,1) shaped matrix in the constructor.
amdAllGradWs[ -1 ] = np.dot( mdDelta, mdA_LHL_Transpose )

###########
# Stage II: Compute mdGradB and mdGradB for the last hidden layer
#           and go back, all the way up to the first hidden layer

# Note that only the negative of the index ever gets used in the
# following loop.
# a[-2] means the second-last layer (i.e. the last hidden layer).
# a[-1] means the last layer (i.e., the OP layer).
# a[-2-1] means the layer just before the second-last layer.
# The current layer starts from the output layer. The previous
# layer is the one before the current.
# If the indexing gets confusing, notice that Nielsen uses the range
# (2, nLayers), rather than that of (1, nLayers-1) which would have
# been a bit more natural, at least to me. But of course, his choice
# is slightly more consistent from a mathematical viewpoint---and
# from the fact that there are no weights and biases for the IP
# layer.
for nL in range( 2, self.m_nLayers ):
# Find the mdDelta for the previous layer, using that for the
# current layer...
mdZCur = l_mdAllZs[ -nL ]
mdZCurDer = SigmoidDerivative( mdZCur )
mdWNextTrans = self.m_amdAllWs[ -nL+1 ].transpose()
# Note, for the very first pass through the loop, the mdDelta being
# used on the RHS is what was earlier computed in the Stage I given
# above (viz. for the output layer). It now gets updated here, so
# that it can act as the "previous" mdDelta in the next pass
# through this loop.
mdDelta = np.dot( mdWNextTrans, mdDelta ) * mdZCurDer
# mdDelta now refers to the current layer, not to the next
amdAllGradBs[ -nL ] = mdDelta

mdAPrevTrans = l_mdAllAs[ -nL-1 ].transpose()
amdAllGradWs[ -nL ] = np.dot( mdDelta, mdAPrevTrans )

# Explaining the above four lines is left as an exercise
# for the reader. (Hints: (i) Write out the matrix equation
# "straight," i.e., as used in the feed-forward pass. (ii)
# Use the rules for the matrix multiplications and the rules
# for taking transposes.)

############################################################################
# Implements the stochastic gradient descent algorithm
def SGD( self, sErrorsFileName, nEpochs,
TrainingData, nMiniBatchSize, TestData ):

# This variable is used only for the x-axis during plotting the
# total activation errors at the OP layer.
adErrors = np.zeros( nEpochs )

# To have a small file-size, we persist the total error for the
# output layer only as averaged over the entire epoch (i.e., the
# entire training data, randomly shuffled).
foErrors = open( sErrorsFileName, 'w' )

nTrainingCases = len( TrainingData )

# An epoch is defined for the entire training data. However, each epoch
# begins by randomly shuffling the training data. Effectively, we
# change the initial position from which to begin descending down the
# higher-dimensional total-cost-function surface.
for e in range( nEpochs ):

# Create Mini-Batches covering the entire training data
np.random.shuffle( TrainingData )
MiniBatches = [ TrainingData[ k : k + nMiniBatchSize ]
for k in range( 0, nTrainingCases, nMiniBatchSize ) ]

dTotalErrorEpoch = 0
# For each mini-batch
for MiniBatch in MiniBatches:
# Conduct the training over the entire mini-batch,
# and collect the errors accumulated over it. Add them
# to the accumulated error for the current epoch.
l_mdErrorsMB = self.UpdateMiniBatch( dEta, nMiniBatchSize, MiniBatch )
dAvgErrorMB = self.ComputeAvgErrorForMiniBatch( l_mdErrorsMB )
dTotalErrorEpoch = dTotalErrorEpoch + dAvgErrorMB

dAvgErrorEpoch = dTotalErrorEpoch / nMiniBatchSize
adErrors[ e ] = dAvgErrorEpoch
# Write to file
sLine = "%E\n" % (dAvgErrorEpoch)
foErrors.write( sLine )

# self.Evaluate( TestData )
print( "Epoch %d: Avg. Error: %lf" % (e, dAvgErrorEpoch) )

foErrors.close()

plt.xlabel( "Epochs" )
plt.ylabel( "Total Error at the OP Layer\n(Epoch Avg. of Mini-Batch Avgs.)" )
plt.title( "Training Errors" )
plt.show()

############################################################################
# Helper function
def ComputeAvgErrorForMiniBatch( self, l_mdErrorsMB ):
nSize = len( l_mdErrorsMB )
dTotalErrorMB = 0
# For each training case in this mini-batch
for i in range( nSize ):
# Get the error vector for the i-th training case
mdE = l_mdErrorsMB[ i ]
dTotalErrorCase = mdE.sum()
dTotalErrorMB = dTotalErrorMB + dTotalErrorCase
# The average of the total errors for all cases in the current mini-batch
dAvgErrorMB = dTotalErrorMB / nSize
return dAvgErrorMB

############################################################################
# This function is mostly useless in this simple a scenario; it predicts
# 100 % accuracy right from 1st epoch! Obvious. With the kind of input
# data we generate, the prediction would have been 100 % accurate even
# with a single function call to softmax!
#
# def Evaluate( self, TestData ):
#     nTestCases = len( TestData )
#     anPredictedRes = []
#     for x, y in TestData:
#         l_mdAllZs, l_mdAllAs = self.FeedForward( x )

#         mdActivOP = l_mdAllAs[ -1 ]
#         nPredIdx = np.argmax( mdActivOP )
#         nTargetIdx = np.argmax( y )
#         anPredictedRes.append( int( nPredIdx == nTargetIdx ) )
#     print( anPredictedRes )
#     pass
################################################################################
# Class Network ends here
################################################################################

################################################################################
# main script
################################################################################

# Prepare Input Data

# The learning rate
dEta = 0.5

# Number of Neurons contained In each successive Layer
anNIL = [10,10,10]

# Size of the input layer
nNeursInInput = anNIL[ 0 ]
# Size of the target layer
nTargets = anNIL[ -1 ]
# Total number of cases to have for training
nTotalNumCasesForTraining = 10000
# Total number of cases to have for testing
nTotalNumCasesForTesting = nTotalNumCasesForTraining // 5

# Number of cases to generate for each target for training
nCasesPerTargetForTraining = nTotalNumCasesForTraining // nTargets
# Number of cases to generate for each target for testing
nCasesPerTargetForTesting = nTotalNumCasesForTesting // nTargets

# For repeatability, seed the RNG in NumPy. For "true" random-ness
# at each program invocation, comment out the next line. Notice that
# seeding, if any, must occur before generating data.
np.random.seed( 0 )

TrainingData = GenerateData( anNIL[ 0 ], anNIL[ -1 ], nCasesPerTargetForTraining )
TestingData = GenerateData( anNIL[ 0 ], anNIL[ -1 ], nCasesPerTargetForTesting )

# Number of Epochs
nEpochs = 10
# Mini-Batch Size
nMiniBatchSize = 10

# Instantiate the network
# Optionally, we can have a re-seeding also here
# np.random.seed( 10000 )
nn = Network( anNIL )
# Let the network find the most optimum combination of values
# for all the weights and biases it contains.
nn.SGD( "EpochErrors.txt", nEpochs, TrainingData, nMiniBatchSize, TestingData )

print( "Done!" )



Things for you to add and/or to try:

• Add the code to persist the network (weights and biases) to a plain-text CSV file, and to re-init it from that file.
• Add the code to show a simple (real-time) diagrammatic visualization of the training process for a tiny (say [4,3,2]) network, using a colormap-based shades being used for the lines of the synaptic weights and similarly colormap-based circles for neuron background for biases.
• Write a code for a more ambitious code for generation of input data for n-dimensional data. It would have m’ number of islands of volumes (of simplest topology) of randomly varying dark shades which are embedded into a randomly varying background of light shades. That is, it would be an n-dimensional generalization of taking the Windows Paint (or similar) program, having a nonuniform but light-gray fill-color, with dark-shaded blobs of arbitrary shapes randomly placed in it, sort of like a two-phase materials micro-structure. You have to take a configuration of such blobs and introduce random but small variations into its shape and position too. Then, take this small group of variations and assign a target value to it. Repeat for N number of targets.

No songs section this time. Will be back on 31st/1st, as promised (see the post just below).

A song I like:

[Realizing that you wouldn’t be in a mood to listen to (and be able to hum) a neat, melodious, golden oldie, may as well run it right away. … Yes, I will come back with my new year’s resolutions on the 31st/1st (see the post just below), but it’s likely that I won’t run any songs section at that time.]

(Hindi) “dhaDakne lage dil ke taron ki duniya…”
Singers: Mahendra Kapoor, Asha Bhosale
Music: N. Dutta
Lyrics: Sahir Ludhianvi

….Done. Take care and bye for now. See you on the 31st, with the new post…

History:
First published: 2018.12.27 23:29 IST
Updated (also the code): 2018.12.28 09.08 IST

# Some suggested time-pass (including ideas for Python scripts involving vectors and tensors)

Actually, I am busy writing down some notes on scalars, vectors and tensors, which I will share once they are complete. No, nothing great or very systematic; these are just a few notings here and there taken down mainly for myself. More like a formulae cheat-sheet, but the topic is complicated enough that it was necessary that I have them in one place. Once ready, I will share them. (They may get distributed as extra material on my upcoming FDP (faculty development program) on CFD, too.)

While I remain busy in this activity, and thus stay away from blogging, you can do a few things:

1.

Think about it: You can always build a unique tensor field from any given vector field, say by taking its gradient. (Or, you can build yet another unique tensor field, by taking the Kronecker product of the vector field variable with itself. Or, yet another one by taking the Kronecker product with some other vector field, even just the position field!). And, of course, as you know, you can always build a unique vector field from any scalar field, say by taking its gradient.

So, you can write a Python script to load a B&W image file (or load a color .PNG/.BMP/even .JPEG, and convert it into a gray-scale image). You can then interpret the gray-scale intensities of the individual pixels as the local scalar field values existing at the centers of cells of a structured (squares) mesh, and numerically compute the corresponding gradient vector and tensor fields.

Alternatively, you can also interpret the RGB (or HSL/HSV) values of a color image as the x-, y-, and z-components of a vector field, and then proceed to calculate the corresponding gradient tensor field.

Write the output in XML format.

2.

Think about it: You can always build a unique vector field from a given tensor field, say by taking its divergence. Similarly, you can always build a unique scalar field from a vector field, say by taking its divergence.

So, you can write a Python script to load a color image, and interpret the RGB (or HSL/HSV) values now as the $xx$-, $xy$-, and $yy$-components of a symmetrical 2D tensor, and go on to write the code to produce the corresponding vector and scalar fields.

Yes, as my resume shows, I was going to write a paper on a simple, interactive, pedagogical, software tool called “ToyDNS” (from Toy + Displacements, Strains, Stresses). I had written an extended abstract, and it had even got accepted in a renowned international conference. However, at that time, I was in an industrial job, and didn’t get the time to write the software or the paper. Even later on, the matter kept slipping.

I now plan to surely take this up on priority, as soon as I am done with (i) the notes currently in progress, and immediately thereafter, (ii) my upcoming stress-definition paper (see my last couple of posts here and the related discussion at iMechanica).

Anyway, the ideas in the points 1. and 2. above were, originally, a part of my planned “ToyDNS” paper.

3.

You can induce a “zen-like” state in you, or if not that, then at least a “TV-watching” state (actually, something better than that), simply by pursuing this URL [^], and pouring in all your valuable hours into it. … Or who knows, you might also turn into a closet meteorologist, just like me. [And don’t tell anyone, but what they show here is actually a vector field.]

4.

You can listen to this song in the next section…. It’s one of those flowy things which have come to us from that great old Grand-Master, viz., SD Burman himself! … Other songs falling in this same sub-sub-genre include, “yeh kisine geet chheDaa,” and “ThanDi hawaaein,” both of which I have run before. So, now, you go enjoy yet another one of the same kind—and quality. …

A Song I Like:

[It’s impossible to figure out whose contribution is greater here: SD’s, Sahir’s, or Lata’s. So, this is one of those happy circumstances in which the order of the listing of the credits is purely incidental … Also recommended is the video of this song. Mona Singh (aka Kalpana Kartik (i.e. Dev Anand’s wife, for the new generation)) is sooooo magical here, simply because she is so… natural here…]

(Hindi) “phailee huyi hai sapanon ki baahen”
Music: S. D. Burman
Lyrics: Sahir
Singer: Lata Mangeshkar

But don’t forget to write those Python scripts….

Take care, and bye for now…

# Exactly what does this script show?

Update on 02 March 2018, 15:34 IST: I have now added another, hopefully better, version of the script (but also kept the old one intact); see in the post below. The new script too comes without comments.

Here is a small little Python script which helps you visualize something about a state of stress in 2D.

If interested in understanding the concept of stress, then do run it, read it, try to understand what it does, and then, if still interested in the concept of stress, try to answer this “simple” little question:

Exactly what does this script show? Exactly what it is that you are visualizing, here?

I had written a few more notes and inline comments in the script, but have deliberately deleted most of them—or at least the ones which might have given you a clue towards answering the above question. I didn’t want to spoil your fun, that’s why.

Once you all finish giving it a try, I will then post another blog-entry here, giving my answer to that question (and in the process, bringing back all the deleted notes and comments).

Anyway, here is the script:


'''
A simple script to help visualize *something* about
a 2D stress tensor.

--Ajit R. Jadhav. Version: 01 March 2018, 21:39 HRS IST.
'''

import math
import numpy as np
import matplotlib.pyplot as plt

# Specifying the input stress
# Note:
# While plotting, we set the x- and y-limits to -150 to +150,
# and enforce the aspect ratio of 1. That is to say, we do not
# allow MatPlotLib to automatically scale the axes, because we
# want to appreciate the changes in the shapes as well sizes in
# the plot.
#
# Therefore, all the input stress-components should be kept
# to within the -100 to +100 (both inclusive) range.
#
# Specify the stress state in this order: xx, xy; yx, yy
# The commas and the semicolon are necessary.

sStress = "-100, 45; 90, 25"

axes = plt.axes()
axes.set_xlim((-150, 150))
axes.set_ylim((-150, 150))
plt.axes().set_aspect('equal', 'datalim')
plt.title(
"A visualization of *something* about\n" \
"the 2D stress-state [xx, xy; yx, yy] = [%s]" \
% sStress)

mStress = np.matrix(sStress)
mStressT = np.transpose(mStress)

mUnitNormal = np.zeros((2, 1))
mTraction = np.zeros((2, 1))

nOrientations = 18
dIncrement = 360.0 / float(nOrientations)
for i in range(0, nOrientations):
dThetaDegrees = float(i) * dIncrement
dThetaRads = dThetaDegrees * math.pi / 180.0
mTraction = mStressT.dot(mUnitNormal)
if i == 0:
plt.plot((0, mTraction[0, 0]), (0, mTraction[0, 1]), 'black', linewidth=1.0)
else:
plt.plot((0, mTraction[0, 0]), (0, mTraction[0, 1]), 'gray', linewidth=0.5)
plt.plot(mTraction[0, 0], mTraction[0, 1], marker='.',
markeredgecolor='gray', markerfacecolor='gray', markersize=5)
plt.text(mTraction[0, 0], mTraction[0, 1], '%d' % dThetaDegrees)
plt.pause(0.05)

plt.show()



Update on 02 March 2018, 15:34 IST:

Here is a second version of a script that does something similar (but continues to lack explanatory comments). One advantage with this version is that you can copy-paste the script to some file, say, MyScript.py, and invoke it from command line, giving the stress components and the number of orientations as command-line inputs, e.g.,

python MyScript.py "100, 0; 0, 50" 12


which makes it easier to try out different states of stress.

The revised code is here:


'''
A simple script to help visualize *something* about
a 2D stress tensor.

History:
06 March 2018, 10:43 IST:
In computeTraction(), changed the mUnitNormal code to make it np.matrix() rather than python array
02 March 2018, 15:39 IST; Published the code
'''

import sys
import math
import numpy as np
import matplotlib.pyplot as plt

# Specifying the input stress
# Note:
# While plotting, we set the x- and y-limits to -150 to +150,
# and enforce the aspect ratio of 1. That is to say, we do not
# allow MatPlotLib to automatically scale the axes, because we
# want to appreciate the changes in the shapes as well sizes in
# the plot.
#
# Therefore, all the input stress-components should be kept
# to within the -100 to +100 (both inclusive) range.
#
# Specify the stress state in this order: xx, xy; yx, yy
# The commas and the semicolon are necessary.
# If you run the program from a command-line, you can also
# specify the input stress string in quotes as the first
# command-line argument, and no. of orientations, as the
# second. e.g.:
# python MyScript.py "100, 50; 50, 0" 12
##################################################

gsStress = "-100, 45; 90, 25"
gnOrientations = 18

##################################################

def plotArrow(vTraction, dThetaDegs, clr, axes):
dx = round(vTraction[0], 6)
dy = round(vTraction[1], 6)
if not (math.fabs(dx) < 10e-6 and math.fabs(dy) < 10e-6):
axes.annotate(xy=(dx, dy), s='%d' % dThetaDegs, color=clr)

##################################################

mUnitNormal = np.reshape(vUnitNormal, (2,1))
mTraction = mStressT.dot(mUnitNormal)
vTraction = np.squeeze(np.asarray(mTraction))
return vTraction

##################################################

def main():
axes = plt.axes()
axes.set_label("label")
axes.set_xlim((-150, 150))
axes.set_ylim((-150, 150))
axes.set_aspect('equal', 'datalim')
plt.title(
"A visualization of *something* about\n" \
"the 2D stress-state [xx, xy; yx, yy] = [%s]" \
% gsStress)

mStress = np.matrix(gsStress)
mStressT = np.transpose(mStress)
vTraction = computeTraction(mStressT, 0)
plotArrow(vTraction, 0, 'red', axes)
dIncrement = 360.0 / float(gnOrientations)
for i in range(1, gnOrientations):
dThetaDegrees = float(i) * dIncrement
dThetaRads = dThetaDegrees * math.pi / 180.0
vTraction = computeTraction(mStressT, dThetaRads)
plotArrow(vTraction, dThetaDegrees, 'gray', axes)
plt.pause(0.05)
plt.show()

##################################################

if __name__ == "__main__":
nArgs = len(sys.argv)
if nArgs > 1:
gsStress = sys.argv[1]
if nArgs > 2:
gnOrientations = int(sys.argv[2])
main()

`

OK, have fun, and if you care to, let me know your answers, guess-works, etc…..

Oh, BTW, I have already taken a version of my last post also to iMechanica, which led to a bit of an interaction there too… However, I had to abruptly cut short all the discussions on the topic because I unexpectedly got way too busy in the affiliation- and accreditation-related work. It was only today that I’ve got a bit of a breather, and so could write this script and this post. Anyway, if you are interested in the concept of stress—issues like what it actually means and all that—then do check out my post at iMechanica, too, here [^].

… Happy Holi, take care to use only safe colors—and also take care not to bother those people who do not want to be bothered by you—by your “play”, esp. the complete strangers…

OK, take care and bye for now. ….

A Song I Like:

(Marathi [Am I right?]) “rang he nave nave…”